perm filename NOTES.JRA[LSP,JRA]2 blob
sn#092460 filedate 1974-03-20 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00038 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00005 00002 .SEC(Evaluation of LISP expressions,Evaluation)
C00017 00003 .SS(Sexpr translation of LISP expressions)
C00023 00004 .SS(A first look at symbol tables,symbol tables)
C00027 00005 .SS(Introduction to %3λ%*-expressions)
C00032 00006 .SS(%3λ%*-expressions,%3λ%*-expressions,P49:)
C00041 00007 .SS(Mechanization of evaluation,,P6:)
C00050 00008 .SS(Examples of %3eval%*,examples of %3eval%*)
C00054 00009 It should now be clear that %3eval%* and friends %2do%* begin to
C00059 00010 .P43:
C00065 00011 .REQUIRE "PROB8.PUB" SOURCE_FILE
C00066 00012 .SS(%3label%*,%3label%*,P38:)
C00069 00013 .SS(functional arguments,functional argument)
C00079 00014 .SS(special forms,special form,P8:)
C00084 00015 .SS(The %3prog%*-feature,,P37:)
C00097 00016 .SEC(Running on the machine)
C00102 00017 .SS(%3evalquote%*,%3evalquote%*)
C00110 00018 .SS(A project: extensions to %3eval%*,%3eval%*,P67:)
C00119 00019 .SS(A project: Syntax-directed processes,syntax-directed)
C00120 00020 .SEC(Toward better symbol tables and implementation,symbol tables)
C00141 00021 .<<pic of NIL>>
C00142 00022 .SS(A first look at %3cons%1)
C00154 00023 .SS(%3read%* and %3print%*,,P13:)
C00161 00024 .SS(Hashing,hashing,P14:)
C00168 00025 .SS(A review of the structure of the LISP machine,LISP machine)
C00170 00026 .SS(More on symbol tables,symbol tables)
C00174 00027 .SS(Global Variables,global variables)
C00177 00028 .SS(AMBIT/G,AMBIT/G,P26:)
C00181 00029 .<<pic of AMBIT/G for car>>
C00185 00030 when we mark, we reset those nodes which are reachable to point to %7G%4A%1.
C00191 00031 .SS(Garbage collection again,garbage collection)
C00192 00032 .SS(The Contour Model,Contour Model,P31:)
C00194 00033 .SS(Macros and special forms,macros,P18:)
C00209 00034 .SS(%aSM%*-%3eval%*,%3eval%*,P30:)
C00215 00035 .SS(Binding revisited,bind)
C00220 00036 .<<PROBS OF BINDING AND FEXPRS>>
C00222 00037 .SS(Review)
C00226 00038 .SS(%aSM%*:A Simple machine,%aSM%*,P33:)
C00233 ENDMK
C⊗;
.SEC(Evaluation of LISP expressions,Evaluation)
.BEGIN "JRA" TURN ON "←","#";
.SS(Introduction)
%1
We shall now begin to look very closely at the intuituve
process whcih we have been using in the ⊗→evaluation↔← of LISP expressions.
Why? First in an abstract sense,
that is the business of programming languages--evaluation of functions.
They aren't called procedure-oriented languages for nothing! Also
function evaluation is a basic tool of evaluation of LISP functions.
In fact a very general kind of evaluation process is going on in LISP,
that is evaluation of recursive functions. Later we will
study the mechanisms which will support recursive evaluation.
First, as we mentioned in regard to the polynomial arithmetic
examples the intuitive mathematical operations say nothing about the process
of evaluation. %3(2*3) + (5*6)%* evaluates to %336%* regardless
of whether you evaluate %32*3%* first or %35*6%* first, or whether you do them in parallel.
Probably if one were forced to explicate mathematical
evaluation, we would have to say parallel evaluation.
Already on {yon(P17)} we have seen that order of
evaluation in conditional expressions can make a difference,
and certainly when we try to mechanize LISP
function evaluation on a sequential machine we %2will%* have to choose
%2some%* order of evaluation. We must also make some decision regarding
the order of evaluation of the arguments to a function call,
say %3f[x%41%*;x%42%*; ...x%4n%1]. We willl assume that we will evaluate the arguments
from left to right. The evaluation scheme which we choose
is called %2⊗→inside-out↔←%* or %2⊗→call-by-value↔←%*. Alternative proposals exist
and have their merits; ⊗→call-by-name↔← (outside-in) evaluation is another
well-known scheme.
Here again, the style of evaluation can make a difference. Consider the example due
to J. Morris:
%3f(x;y] <= [x = 0 → 0; T → f[x-1;f[y-2;x]]]]%*. Evaluation of %3f[2;1]%*
will terminate if we always evaluate the leftmost-outermost occurrence of %3f%*.
Thus:
←%3f[2;1] = f[1;f[-1;2]] = f[0;f[f[-1;2]-2;-1] = 0;%*
However, if we evaluate the rightmost-innermost occurrences first the
computation will not terminate.
Intuitively, call-by-value says:
evaluate the arguments to a function before you attempt to apply the arguments
to the function definition. Here are some examples:
Let %3f[x;y]%* be %3x%82%* + y%1 and consider %3f[3+4; 2*2]%*. Then
call-by-value says evaluate the arguments (%37%*#and#%34%*) associate those
with the formal parameters of %3f%*(i.e. %3x:7 ;y:4%*) and evaluate the body of
%3f%* (i.e.#%37%82%*#+#4#=#57%1); call by name says pass the unevaluated actual
parameters to the function, then perhaps evaluate. I.e. %3(3+4)%82%*#+#2*2%1
arises (which also evaluates to %357%*). We will say more about other styles of
evaluation later. However most of this section will be restricted to call-by-value.
Notice that the process of evaluation by value
is recursive. That is, it goes something like this:
If the expression to be evaluated is a constant then the value
(of the expression) is that constant. (the value of %33%* is %33%*).
If the expression is a variable the see what the current value associated
with that variable is. (Within the evaluation of say
%3f[3;4]%* where %3f[x;y]#<=#x%82%*#+#y %1the current value of the variable
%3x%* is %33%*.) The only other kind of expression that we can have is a function
name followed by arguments. (Say %3f[3;4]%*). In this case we evaluate the arguments,
that is, we %2recur%* and then apply the definition of the function to those
evaluated arguments. How do you apply the evaluated arguments to the function
definition? You associate or ⊗→bind↔← the formal parameters (or variables)
of the definition to the values of the actual parameters.
Then you evaluate the body of the function. Why all the emphasis
on the mechanism of function evaluation? For one reason implementation
of programming languages requires careful consideration of this
problem. If you understand the formal model of LISP and the corresponding
model of implementation, you should have an exceptionally
clear picture of the problems of implementation of languages, and you
will understand many of the problems of semantics of languages.
How do we transcribe the evaluation of arithmetic functions to LISP functions.
It's easy. First, the class of LISP constants is larger
(than the integers) What are the constants of LISP? They're just the Sexprs.
(The value of %3(A#.#B)%* is %3(A#.#B)%* --- just like the value of %33%* is %33%*
⊗↓We are ignoring the distinction between the %2numeral%* %33%* and the %2number%3 3.%1←).
The additional artifact of LISP which we have to talk about with respect to
evaluation is the conditional expression. But clearly its evaluation can also
be precisely specified. See {yon(P40)}.
In more specific detail here is some of the structure of the
LISP evaluation mechanism:
If the expression to be evaluated is a constant then the value
is that constant.
If the expression is a conditional then it is of the form
%3[p%41%* → e%41%*; p%42%* → e%42%*; ... p%4n%* → e%4n%1]. Evaluate it
using the semantics defined on {yon(P40)}.
If the expression is of the form: function name followed by
arguments then:
.BEGIN TABIT1(5);
\%21.%1 evaluate the arguments (from left to right)
\%22.%1 find the definition of the function
\%23.%1 associate the evaluated arguments with the formal parameters in the function definition
\%24.%1 evaluate the body of the function.
.END
So we %2can%* describe the outline of a mechanical procedure which
can be used in the evaluation of LISP functions. Recall our discussion
of the differentiation formulas in {yonss(P41)}. We described the mechanism of
differentiating as a recursive process. Then mechanized that scheme
as a LISP function, translating the polynomials to list notation,
since the domain of LISP functions is Sexprs. We are in a similar
situation here. It should be clear that we could write a LISP
function representing the evaluation process provided that we can
find a representation for LISP expressions as Sexpressions.
This mapping of LISP expressions to Sexprs is our first order of business. We
will accomplish this mapping by using an extension of the scheme introduced in
{yonss(p41)}.
This whole process of mapping LISP expressions onto sexprs, and writing a LISP
function to act as an evaluator may seem a bit incestuous or difficult to fathom.
First, it is no more obscure than the differentiation example.
It is just another instance of the diagram of {yon(P46)},
only now we are applying the process to LISP itself. The effect is to force
us to make precise exactly what is meant by LISP evaluation. This
precision will have many important ramifications.
Second, we've been doing evaluation of sexpr representations of
LISP expressions already.
The %2great mother of all functions%* is exactly the evaluation mechanism for
the LISP primitive functions and predicates, %3car, cdr, cons, atom%* and %3eq%*
when restricted to functional composition and constant arguments.
The %2great mother revisited%* is the extension to conditional expressions.
But let's stop for some examples of translating LISP functions into Sexpr notation.
.SS(Sexpr translation of LISP expressions)
.BEGIN NOFILL;TABS 25,50;TURN ON "\";
.GROUP
\%2LISP expression\translation as Sexpr
%1we will represent numbers just as numbers, e.g.:
\%32\2
%1we will translate identifiers to their upper-case counterpart. Thus:
%3\x\X
\y2\Y2
\car\CAR
.APART
.GROUP
.BEGIN FILL;
%1Now we've got a problem:
We wish to map arbitrary LISP expressions to Sexpressions. The LISP expression,
%3x%* translates to %3X%*; %3X%* is itself a LISP expression (a constant);
%2it%* must also have a translate.
So we must be a little careful here.
We will wish to interpret the sexpr output
from Son of Great Mother as a value for a LISP calculation. That is given the output,
there should be exactly one way of interpreting it as an answer.
The mapping must be 1-1. So we must represent %3x%* and %3X%* as %2different%* sexprs.
The translation scheme we pick is:
for any sexpression, y, its translation is %3(⊗→%3QUOTE%*↔←%1 y%3)%1.
For example:
.END
\%3X\(QUOTE X)
\(A . B)\(QUOTE (A . B))
\QUOTE\(QUOTE QUOTE)
.APART
.GROUP
%1
.BEGIN FILL;
We have already seen one satisfactory mapping for functions in prefix form
in {yonss(P41)}.
We will use that mapping (called Cambridge Polish) here. That is:
.END
\%3f[e%41%*;e%42%*; ...e%4n%*]\(F e%41%*' e%42%*' ...e%4n%1')
\\ where %3e%4i%1' is the translate of %3e%4i%3
%1Examples:%*
\car[x]\(CAR X)
\car[X]\(CAR (QUOTE X))
\cons[cdr[(A . B)];x]\(CONS (CDR (QUOTE (A . B))) X)
.APART
.GROUP
%1What's left? conditional expressions. The mapping is self-explanatory.
\%3[p%41%* → e%41%*; ... p%4n%* → e%4n%*]\(⊗→%3COND%*↔← (p%41%*' e%41%*') ... (p%4n%*' e%4n%*'))
%1and an example:%3
\[atom[x] →1; q[y] → X]\(COND ((ATOM X) 1) ((Q Y) X))
%1
.APART
.END
.P44:
Notice that %3(COND#(#...#))%* and %3(QUOTE#...#)%* %2look%* like translates of function
calls of the form %3cond[#...#] %* and %3quote[#...#]%*. This is %2not%* the case,
%3QUOTE%* and %3COND%* are not translates of LISP functions.
The constructs %3QUOTE%* and %3COND%* do not use call-by-value evaluation and
are handled in a special way as we shall soon see.
Since %3T%* and %3NIL%* are used so frequently, we chose their
translates to be %3T%* and %3NIL%* rather than %3(QUOTE#T)%* and %3(QUOTE#NIL)%*.
You should now go back and look at the %2tgm%*'s ({yonss(P39)}) again now
that you know
what they mean. You should note that with respect to atoms, the great
mothers are only defined for %3T%* and %3NIL%*. Other atoms (which are
the translates of variables) are not recognized and return a h.s. message.
The question of variables requires some careful study. Before we do that,
let's see what the great mother's actually accomplish.
First, tgm's explicitly tell you what the meaning of various subsets
of LISP is. That is, the tgms explicitly codify the semantics (or meaning)
of LISP. Next, if you are able to code tgm on a machine, then you can
input the Sexpr form of LISP expressions and they will be evaluated for you.
%2Great mother does it all.%*
The part of our informal argument which is clearly missing
in the tgms is the details of handling variables and the names of functions. That is,
how do we describe the binding of variables to values as a mechanical process
and, given the name of a defined function, how do we retrieve the actual definition?
These are actually the same problem: that of symbol table manipulations.
.SS(A first look at symbol tables,symbol tables)
%1
One of the distinguishing features of Computer Science is
its reliance on the ability to store and recover information.
Any formalism which proports to address itself to Computer Science
must take this into account. In particular we must be able to
handle the effect of assignment or binding, that is, the association
of a value with a name. A common device used to accomplish this
is a symbol table. In essence, a symbol table is simply a set of ordered pairs
of objects (a function or relation, if you wish). One of the
elements of each pair is a name; the other is considered to be
a value. We will come back to symbol tables in a while to study
the problems of efficent storage and retrieval of information.
It will suffice now simply to think of a symbol table as represented
in LISP by a list of dotted pairs --- name dotted with value---.
These simple symbol tables are also know as ⊗→association lists↔←
or %2⊗→a-lists↔←%*.
Since we are encoding the %2set%* of objects as a list, we are actually
encoding it as a %2sequence%*. It will be quite useful to exploit this
ordering imposed in the sequence.
.GROUP
Recall that we are representing variables as atoms; then if %3x, y, %*and %3z%*
had current values %32, 3, %*and %34%*, then a symbol table describing that fact could
be encoded as:
←%3((X . 2)(Y . 3)(Z . 4)) .%1
.APART
.GROUP
In the sequel, we will need to add elements to a table and, given a name, find
the corresponding value.
Here's a simple symbol-table-searching function, ⊗→%3assoc%*↔←:
.BEGIN TABS 10,25;TURN ON "\";NOFILL;
%3
\assoc[x;l] <= \[null[l] → NIL;
\\ eq[caar[l];x] → car[l];
\\ T → assoc[x;cdr[l]]]
%1e.g.%*
←assoc[Y;((X . 2)(Y . 3)(Z . 4))] = (Y . 3)
←assoc[U;((X . 2)(Y . 3)(Z . 4))] = NIL
.END
.APART
%1
%3assoc%* will examine the table for left to right, looking for the first
pair whose %3car%*-part matches the given atom. If such a pair is found,
then that pair is returned. If %2no%* such pair is found, the
result is %3NIL%*. Notice there may be more than one pair with the same
%3car%*-part; only the %2first%* is found.
.SS(Introduction to %3λ%*-expressions)
How about adding things to tables? We will see that %3cons%* plus
the recursion mechanism will automatically update the symbol table
for %2Great-mother%* when we call a function.
We will be adding entries to the symbol table when we call a function,
binding the formal parameters to the actual parameters. Intuitively,
when we call %3f[x;y] <= (x*y + y) %1in%3 f[2;3]%1, the dotted pairs
%3(X . 2) %*and %3 (Y . 3)%* should find their way into the table.
Besides the binding of formal parameters to actual values, there is
another binding process occurring in LISP.
Before we can call the function named %3f%*, its definition must
be in the symbol table; so there should be an entry (dotted pair)
with name-part %3F%* and a value-part representing %3(x*y + y)%*. An obvious
representation is perhaps,
←%3(PLUS (TIMES X Y)Y)%*
That's not quite good enough. If we simply associate the name %3F%*
with the list %3(PLUS#(TIMES#X#Y)#Y)%*, we will have lost some crucial information.
Namely, the relation between the variables %3x%* and %3y%* and the
body of the function will not be present. For example, %3f[y;x]#<=#(x*y#+y)%*
is %2not%*
the same function, but in our naive translation they have the same
representation. The solution is obvious: we must associate the list of
formal parameters with the body of the definition. Perhaps,
←%3((X Y)(PLUS (TIMES X Y) Y)).%*
This is the right approach and it can be made precise.
There is a very elegant and beautiful theory behind this intuitive
discussion. We will sketch a few parts which will be helpful, but
first let's see where we are:
.SS(Review of evaluation)
We have seen that it is possible to mechanize at least one
scheme for evaluation of functions -- left-to-right and call-by-value.
We have seen that it is possible to translate LISP expressions
into Sexprs in such a way that we can write a LISP function
which will act as an evaluator for such translates. In the process
we have had to mechanize the intuitive devices we (as humans) might
use to recall the definition of functions and to recall the current
values of variables. It was soon clear that the same mechanism
could be used: that of symbol tables. To associate a variable with
a value was easy: to associate a function name with its definition
required some care. That is, part of the definiton of a function
involves the proper association of formal parameters with the
body of the definition. (We actually need to be a bit more
precise than this in describing recursive functions, but this is
good enough for now.) The device we chose is called the %2lambda
notation%*.
.SS(%3λ%*-expressions,%3λ%*-expressions,P49:)
***more***
The λ-notation is a device invented by the logician, Alonzo Church, in
conjunction with his studies in logic and foundations of mathematics.
It was introduced into Computer Science by John McCarthy in
the description of LISP.
.BEGIN TURN ON "←";
As we intimiated previously, the λ-notation is used in LISP to allow
us to attach names to functions in an unambiguous manner.
We have been very imprecise when writing %3f[x;y]#<=#(x*y#+#y)%* as
a definition of the function %3f%*.
First note that %3f[x;y]%* is %2not%* a function, %3f%* is.
To see what %3f[x;y]%* means consider the following example.
When we are asked to evaluate %3car[(A . B)]%* we say the value is %3A%*.
%3car[(A . B)]%* is an expression to be evaluated; it is a LISP ⊗→form↔←.
If %3car[(A . B)]%* %2is%* a form then so is %3car[x]%*;
only now the value of the
form depends on the current value assigned to the variable %3x%*.
So the %2⊗→function↔←%* is %3car%*; the %2form%* is %3car[x]%*.
Also our notation has really been specifying more than just the name.
The notation specifies the formal parameters (%3x%* and %3y%*) and
the order in which we are to associate actual parameters in a call
with the formals of the definition (%3x%* with the first, %3y%* with the second).
More subtly, the notation tells %2which%* variables in the function body
are bound variables. That is, which variables are to expect values when the
function is called. For example define %3g[x]#<=#(x*y#+#y)%*; then the
value of %3g[2]%* is well defined and is itself a function.
We wish to have a notation to assign all this other information with the
function body so that function definitions can be inserted into the symbol
table as "values". They will be parametric values, but they will be values.
The λ-notation performs this task by preceeding the function body with
a list of the bound variables, called %2⊗→lambda variables↔←%*.
The list of variables is usually referred to as the %2⊗→lambda list↔←%*.
Then this resulting construct is preceeded by "λ[" and followed by "]".
Thus using the above example, the function %3f%* is exactly the
same function as %3λ[[x;y]#x*y#+#y]%*.
We actually need a bit more than
λ-notation to specify recursive functions in a natural manner. See {yonss(P38)}.
The λ-notation introduces nothing new as far as our intuitive binding and
evaluation processes are concerned, it only makes these operations more
precise. So to evaluate a λ-expression, bind the evaluated actual parameters
to the λ-variables and evaluate the function body.
Since we intend to include λ-expressions in our language we must include
a translation of them into sexpression form:
.BEGIN GROUP;TABIT2(25,50);
\%2LISP expression\translation as Sexpr%*
\%3λ[[x%41%*; ...; x%4n%*]%9x%*]\(LAMBDA (X%41%* ... X%4n%*) %1translate of %9x%3)
.END
Thus the character, λ, will be translated to %3LAMBDA%*. %3LAMBDA%* is therefore a
kind of prefix operator (like %3COND%*) which allows the evaluator to recognize
the occurrence of a function definition (just as %3COND%* is used by the
evaluator to recognize the occurence of a (translate of a) conditional
expression).
Here are some examples of %3λ%*-expressions and their evaluation,
followed by some examples of the translation scheme:
.BEGIN NOFILL TABS 10;TURN ON "\";
\%2a. %3λ[[x;y] x%82%* +y][2;3] = 2%82%* + 3 = 7
\%2b. %3λ[[x;y] cons[car[x];y]][(A . B); C] = (A . C)
\%2c. %3λ[[y] cdr[y]][(A B)] = (B)
\%2d. %3λ[[x] car[x]][λ[[y] cdr[y]][(A B)]]
\ = λ[[x] car[x]][(B)]
\ = B
\%2e. %3λ[[x;y] x%82%* +y] ==> (LAMBDA(X Y)(PLUS (EXPT X 2) Y))
\%2f. %3λ[[x;y] cons[car[x];y]] ==> (LAMBDA(X Y)(CONS (CAR X) Y))
.END
******MORE*******
Our LISP syntax equations should now be augmented to include:
.BEGIN TABIT1(11);GROUP;
<function>\::= λ[<varlist><form>]
<varlist>\::= [<variable>; ... ; <variable>]
.END
One of the other more valuable application of λ-notation occurs in that
nebulous area called efficiency. Efficiency is like structured programming --
difficult to define but recognizible when it occurs.
Consider the following sketch of a function definition:
←%3g <= λ[[x][p[lic[x]] → lic[x]; .... x ...]]%*.
Where %3lic%* may be a %2l%*ong %2i%*nvolved %2c%*alculation.
We certainly must compute %3lic[x]%* %2once%*. But as %3g%* is defined,
we would compute %3lic[x]%* %2again%* as e%41%* if p%41%* is true. Since our
current LISP subset has no side effects, this second calculation
is unnecessary. %3g%* is inefficient. Using λ-expressions, in a style
called %2⊗→internal lambdas↔←%* we can improve %3g%* as follows:
Replace the body of %3g%* with,
←%3λ[y][p[y] → y; ... x ...][lic[x]]%*. (*1*)
Call this new function %3g'%*:
←%3g' <= λ[[x]λ[y][p[y] → y; ... x ... ][lic[x]]] %1.
Now when %3g'%* is called we evaluate the actual parameter, binding it
to %3x%*, as always; and evaluate the body, (*1*). Evaluation of (*1*)
involves calculation of %3lic[x]%* (%2once%*), binding the result to
%3y%*. We then evaluate the body of the conditional expression as before.
If p%41%* %2is%* true, then this definition of %3g'%* involves
one calculation of %3lic[x]%* and two table look-ups (for the value of %3y%*),
rather than the two calculations of %3lic[x]%* in %3g%*.
More conventional programming languages can obtain the same effect as
this use of internal lambdas by assignment of %3lic[x]%* to an
temporary variable. We will introduce assignment statements in LISP in
{yonss(P37)}.
.END
****problems***
.SS(Mechanization of evaluation,,P6:)
%1
As we have said, %3tgmoaf%* and %3tgmoafr%* ({yonss(P39)}) are evaluators
for subsets of LISP.
Armed with your symbol-table mechanism we could now extend the
%2great-mothers%* to handle variable look-ups.
Rather than do this we will display our burgeoning cleverness
and make a total revision of the structure of the
evaluators. In making the revision, the
following points should be remembered:
1. Expressions (to be evaluated) can contain variables, both simple
variables and variables naming λ-expressions. Thus evaluation must be done
with respect to an environment or symbol table. We wish to recognize other
(translates of) function names besides %3CAR, CDR, CONS, EQ%* and%3 ATOM%*
in our evaluator, but
explicitly adding new definitions to the evaluator in the style of the
recognizers for the five primitives is a bad idea. Our symbol table should
hold the function definitions and the evaluator should contain the general
schemes for finding the definitions, binding variables to values, and
evaluating the function body. By now you should be convinced that this
process is a reasonably well defined task and could be written in LISP.
2. All %2functions%* are to be evaluated by-value. However there are
some %2⊗→special forms↔←%* we have seen which are not evaluated in the
normal manner. In particular, conditional expressions, quoted expressions,
and lambda expressions are handled differently.
With this short introduction we will now write a more general evaluator
which will handle a larger subset of LISP than the %3tgm%*s.
The primary function is named ⊗→%3eval%*↔← (rather than Son of Great Mother).
It will take two arguments; the first
is a representation of an expression to be evaluated, and the second
is a representation of a symbol table. %3eval%* will recognize numbers, the
constants %3T%* and %3NIL%*, and if presented with a variable, will attempt to
find the value of the variable in the symbol table.
%3eval%* also handles the previously mentioned special forms. If %3eval%* sees
a ⊗→conditional expression↔← (represented by %3(COND ...)%* ) then the body
of the ⊗→%3COND%*↔← is passed to a subfunction named ⊗→%3evcond%*↔←.
%3evcond%* encodes the semantics as described on {yon(P40)}. The special
form, ⊗→%3QUOTE%*↔←, signifies the occurrence of a constant. Simply return
that constant.
As far as this %3eval%* is concerned, any
other expression is a function-call. In this case we %2apply%* the function
to the list of evaluated arguments. The function may either be a name or
it may be a λ-expression. %3apply%* handles both cases.
.BEGIN NOFILL;TURN ON "\";TABS 16,27;
.GROUP
Here's the new %3eval%*:
%3
⊗→eval↔← <= λ[[e;a]\[atom[e] →\[numberp[e] → e;
\\ eq[e;T] → T;
\\ eq[e;NIL] → NIL;
\\ T → cdr[assoc[e;a]]];
\ atom[car[e]] →\[eq[car[e];QUOTE] → cadr[e];
\\ eq[car[e];COND] → evcond[cdr[e];a];
\\ T → apply[car[e];evlis[cdr[e];a];a]]
\ T → apply[car[e];evlis[cdr[e];a];a]]]
.APART
.GROUP
%1where:%*
evcond <= λ[[e;a][eval[caar[e];a] → eval[cadar[e];a];
\ T → evcond[cdr[e];a]]]
.APART
.GROUP
%1and,%*
evlis <= λ[[e;a][null[e] → NIL;
\ T → cons[eval[car[e];a];evlis[cdr[e];a]]]]
.APART
%1
.END
The subfunctions, %3evcond%* and %3evlis%*, are simple. %3evcond %*you've seen
before in %3tgmoafr%*; %3evlis%* simply manufactures a new list consisting of the
results of evaluating (from left to right) the elements of %3e%*, using the
symbol table, %3a%*, where necessary.
The interesting innovations appear in the function, %3apply%*.
⊗→%3apply%*↔← takes three arguments: a function, a list of evaluated arguments,
and a symbol table. %3apply%* explicitly recognizes the five primitive functions %3CAR,
CDR, CONS, EQ, %*and%3 ATOM%*. If the function name is a variable,
(represented as an atom) then the
definition is located in the symbol table by %3eval%* and applied to the arguments.
Otherwise the function must be a λ-expression. This is where things
get interesting. We know what we must do: evaluate the body of the
λ-expression after binding the formal parameters of the λ-expression
to the evaluated arguments. How do we ⊗→bind↔←? We add variable-value
pairs to the symbol table. We will define a subfunction, %3pairlis%*, to
perform the binding. Then all that is left to do is give the function
body and the new symbol table to %3eval%*. Here is %3apply%*:
.BEGIN NOFILL;TURN ON "\";TABS 20,28;
.GROUP
%3
⊗→apply↔← <= λ[[fn;x;a][atom[fn] → [\eq[fn;CAR] → caar[x];
\\eq[fn;CDR] → cdar[x];
\\eq[fn;CONS] → cons[car[x];cadr[x]];
\\eq[fn;ATOM] → atom[car[x]];
\\eq[fn;EQ] → eq[car[x];cadr[x]];
\\T → apply[eval[fn;a];x;a]];
\eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];x;a]] ]]
.APART
.GROUP
pairlis <= λ[[v;w;x][null[w] → x;
\T → cons[cons[car[v];car[w]];pairlis[cdr[v];cdr[w];x]]]]
.APART
%1
.END
So the only really new development is in the λ-binding process.
Notice that %3pairlis%* makes a new symbol table by consing new pairs onto
the front of the old symbol table; and recall that %3assoc%* finds the
first matching pair in the symbol table. %2This is important.%*
.SS(Examples of %3eval%*,examples of %3eval%*)
After this onslot, some examples are in order.
Let's evaluate %3f[2;3]%* where %3f <= λ[[x;y] x%82%* + y].%1
After appropriate translation this is equivalent to evaluating:
←%3eval[(F 2 3);((F . (LAMBDA(X Y)(PLUS (EXPT X 2)Y))))]%*
Notes:
%21.%* %3((F . (LAMBDA (X Y) ... ))) = ((F LAMBDA (X Y) ... )) %*
%22.%* Since the symbol table, %3((F ...))%* occurs so frequently in the following
trace, we will abbreviate it as %3st%*. Note that we have no mechanism yet
for permanently increasing the repetoire of known functions. We must
resort to the subterfuge of initializing the symbol table to get the function
%3f%* defined.
%23.%* For this example we must assume that + and ↑ (exponentiation) are known functions. Thus
apply would have to contain recognizers for %3PLUS%* and %3TIMES%*:
.BEGIN NOFILL;
.GROUP
%3
...atom[fn] → [eq[fn];PLUS] → +[car[x];cadr[x]];
eq[fn;EXPT] → ↑[car[x];cadr[x]];
...
.APART
%1
.END
.BEGIN NOFILL;TABS 10,25;TURN ON "\";
.GROUP
So %3eval[(F 2 3);st]
\= apply[car[(F 2 3)]; evlis[cdr[(F 2 3)];st];st]
\= apply[F;evlis[(2 3);st];st]
\= apply[F;(2 3);st]
\= apply[eval[F;st];(2 3);st]
\= apply[(LAMBDA(X Y)(PLUS (EXPT X 2) Y)); (2 3);st]
\= eval[caddr[(LAMBDA(X Y)(PLUS (EXPT X 2) Y))];
\ pairlis[cadr[(LAMBDA(X Y)(PLUS (EXPT X 2) Y))];(2 3);st]]
\= eval[(PLUS (EXPT X 2) Y);pairlis[(X Y);(2 3);st]]
\= eval[(PLUS (EXPT X 2) Y);((X . 2)(Y . 3)(F LAMBDA(X Y) ...))]
\= apply [PLUS; evlis[((EXPT X 2) Y);((X . 2)(Y . 3)..)];((X . 2)...)]
\ %1Let's do a little of:%3 evlis[((EXPT X 2)Y);((X . 2)(Y . 3)...)]
\\= cons[eval[(EXPT X 2);((X . 2)(Y . 3) ...)];
\\ evlis[(Y);((X . 2) ...)]]
\\= cons[apply[EXPT;evlis[(X 2);((X . 2)...)];((X . 2) ...]
\\ evlis[(Y); ...]]
\\= cons[apply[EXPT;(2 2);((X . 2) ...];evlis[(Y); ...]]
\\= cons[↑[car[(2 2)];cadr[(2 2)]];evlis[(Y); ... ]]
\\= cons[↑[2;2];evlis[(Y); ... ]]
\\= cons[4;evlis[(Y);((X . 2)(Y . 3) ....)]]
\\= cons[4;cons[eval[Y;((X .2) ...)]; evlis[NIL;(( ...))]]]
\\= cons[4;cons[3;NIL]]
\\= (4 3) %1 which you knew anyway.
Now back to apply:
%3
\= apply[PLUS;(4 3);((X . 2)(Y . 3) ... )]
\= +[4;3]
\= 7 %1which you also knew.
.APART
.END
It should now be clear that %3eval%* and friends %2do%* begin to
perform as you would expect. It perhaps is not clear that a
simpler scheme might do as well. In particular, the complexity
of the symbol table mechanism which we claimed was so important
has not been exploited. The next example will indeed show that a
scheme like ours is necessary to keep track of variable bindings.
.GROUP
Let's sketch the evaluation of %3fact[3]%* where:
←%3fact <= λ[[x][x = 0 → 1;T → *[x;fact[x-1]]]].%*
.P42:
That is, %3eval[(FACT 3);st] %*where %3st%* names the initial symbol table,
←%3((FACT LAMBDA(X)(COND((ZEROP X)1)(T(TIMES X(FACT (SUB1 X))))))).%1
.APART
In this example we will assume that the binary function, *, the
unary predicate, %3zerop#<=#λ[[x]x#=#0]%* and unary function,
%3 sub1#<=#λ[[x]#x-1]%* are known and are
recognized in the evaluator by %3TIMES, ZEROP%* and %3SUB1%* respectively.
.BEGIN NOFILL;TURN ON "\";TABS 10,20,30;
.GROUP
%1Then%3 eval[(FACT 3);st]
\= apply[FACT;evlis[(3);st];st]
\= apply[(LAMBDA(X)(COND ...));3;st]
\= eval[(COND((ZEROP X)1)(T( ...)));((X . 3) . st)]
\= evcond[(((ZEROP X)1)(T(TIMES X(FACT (SUB1 X)))));((X . 3) . st)]
%1(Now, let %3st1%* be%3 ((X . 3) . st) )
\= eval[(TIMES X(FACT (SUB1 X))); st1]
\= apply[TIMES;evlis[(X (FACT (SUB1 X))); st1];st1]
\= apply[TIMES;cons[3;evlis[((FACT (SUB1 X))); st1];st1]
%1Now things get a little interesting inside %*evlis:
\evlis[(((FACT (SUB1 X)));st1]
\\= cons[eval[(FACT (SUB1 X)); st1];NIL]
\\%1 and %* eval[(FACT (SUB1 X));st1]
\\\= apply[FACT;evlis[((SUB1 X));st1];st1]
\\\= apply[FACT; (2);st1]
\\\= apply[(LAMBDA (X)(COND ...));(2);st1]
\\\= eval[(COND((ZEROP X) 1) ...));((X . 2) . st1)]
\\\...
.APART
.FILL
%1
Notice that within this latest call on %3eval%* the symbol-table-searching function,
%3assoc%*, will find the pair %3(X . 2)%* when looking for the value of %3x%*. This is
as it should be. But notice also that the older binding, %3(X . 3)%*, is still
around in the symbol table %3st1%*, and will become accessible once we complete
this latest call on %3eval%*.
.GROUP
As the computation continues, the symbol table is proceeding initially from,
.NOFILL
%3
←st =((FACT LAMBDA(X)(COND ...))) %1to%3,
←((X . 3) . st) %1to%*,
←((X . 2)(X . 3) . st) %1to%*,
←((X . 1)(X . 2)(X . 3) . st) %1to finally%*,
←((X . 0)(X . 1)(X . 2)(X . 3) . st).
%1
.FILL
.APART
Since the search function, %3assoc%1, always proceeds from left to right through
the table and since the table entry function, %3pairlis%*, always %3cons%*es pairs
onto the front of the table before calling %3eval%*, we will get the correct
binding of the variables.
This symbol table mechanism is very important, so let's look at it
again in a slightly different manner.
.END
.P43:
We will represent calls on %3eval%* informally, using LISP expressions
rather than Sexprs as input. We represent the symbol table between vertical
bars, "|", in such a way that if a table, t%41%*, is:
.BEGIN TABIT2(10,17);
\| b%4n%*\|
\| ...\| then %3cons%*ing a new element, b%4n+1%* onto t%41%* gives:
\| b%41%*\|
\| b%4n+1%*\|
\| b%4n%*\|
\| ...\|
\| b%41%*\|
.END
%3
.BEGIN NOFILL;TABS 10,46,58;TURN ON "\";
%1Then:%*
eval[`fact[3]'; |fact = λ[[x][x=0 → 1;T → *[x;fact[x-1]]]]|]
.GROUP
\= eval[`λ[[x][x=0 → 1; T → *[x;fact[x-1]]]];\|x = 3 | ]
\\|fact = λ[[ ... |
.APART
.GROUP
\= *[3;eval[`λ[[x] ....'; \| x = 2 |]
\\| x = 3 |
\\|fact = λ[[ ...] |
.APART
.GROUP
\= *[3; *[2;eval[`λ[[x] ... ';\| x = 1 |]
\\| x = 2 |
\\| x = 3 |
\\|fact = λ[[ ... |
.APART
.GROUP
\= *[3; *[2; *[1;eval[`λ[[x] ... ';\| x = 0 |]
\\| x = 1 |
\\| x = 2 |
\\| x = 3 |
\\|fact = λ[[ ... |
.APART
.GROUP
\= *[3; *[2; *[1;1]]] %1with:%*\| x = 1 |
\\| x = 2 |
\\| ... |
.APART
.GROUP
\= *[3; *[2;1]] %1with:%*\| x = 2 |
\\| ... |
.APART
.GROUP
\= *[3;2] %1with:%*\| x = 3 |
\\| ... |
\= 6. %1with:%*\|fact = λ[[ ... |.
= 6
.END
%1
Notice that after we went to all the trouble to save the old values
of %3x%* we never had to use them. However, in the general case of
recursive evaluation we must be able to save and restore the old values
of variables.
For example recall the definition of %3equal:
.BEGIN NOFILL TURN ON "\";TABS 17;
%3
equal <= λ[[x;y][atom[x] → [atom[y] → eq[x;y];T → NIL];
\ atom[y] → NIL;
\ equal[car[x];car[y]] → equal[cdr[x];cdr[y]];
\ T → NIL]]
%1If we were evaluating:%3
←equal[((A . B) . C);((A . B) . D)],
%1
then our symbol table structure would be changing as follows:
.END
%3
.BEGIN NOFILL;TABS 10,40;TURN ON "\";
.GROUP
\|equal = λ[[x;y] ...| ==>\|x = ((A . B) . C)| ==>
\\|y = ((A . B) . D)|
\\|equal = λ[[x;y]... |
\| x = (A . B) |\| x = A |
\| y = (A . B) |\| y = A |
\| x = ((A . B). C)| ==>\| x = (A . B) |
\| y = ((A . B). D)|\| y = (A . B) | ==>
\|equal = λ[[x;y ... |\| x = ((A . B). C) |
\\| y = ((A . B). D) |
\\|equal = λ[[x;y] ...|
.APART
\| x = B |\| x = C |
\| y = B |\| y = D |
\| x = (A . B) |\| x = ((A . B). C) | ==>
\| y = (A . B) | ==>\| y = ((A . B). D) |
\| x = ((A . B). C) |\|equal = λ[[x;y] ... |
\| y = ((A . B). D) |
\|equal = λ[[x;y] ... |
←|equal = λ[[x;y] ... |
.END
%1
This complexity is necessary, for while we are evaluating
%3equal[car[x];car[y]]%*, we rebind %3x%* and %3y%* but we must save the old
values of %3x%* and %3y%* for the possible evaluation of %3equal[cdr[x];cdr[y]].%*
But the claim is that using %3pairlis%* to cons the new
bindings onto the front of the symbol table as we call %3eval%*
does the right thing. That is, in %3apply:
←eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];x;a].
%1
The tricky part is that when we leave that particular call on
%3apply%*, the old table is automatically restored by the recursion
mechanism. That is, if %3st%* is the current symbol table then
%3cons%*-ing things onto the front of %3st%* doesn't change %3st%*, but if we
call %3eval%* or %3apply%* with a symbol table of say:
←%3cons[(X . 2);cons[(X . 3); st]] %*
then in %2that%* call on %3eval%* or %3apply%* we have access to %3x = 2%*, not
%3x = 3%*.
.REQUIRE "PROB8.PUB" SOURCE_FILE;
.SS(%3label%*,%3label%*,P38:)
One effect of placing "λ" and a list of λ-variables in front
of an expresson is to bind those variables which appear both in
the λ-list and the expression. All other variables
appearing in the expression are %2⊗→free variables↔←%*.
For example, %3f%* is free in the following:
←%3f <= λ[[x][zerop[x] → 1; T → *[x;f[x-1]]] ].%1
Clearly our intention is that the %3f%* appearing the the right of "<="
is the same %3f%* appearing to the left of "<=". That is we want %3f%* to be
a bound variable and not free.
This has not been a problem for us. We have simply pre-loaded the
symbol table, binding %3f%* to its definition (or value). See {yon(P42)}. LISP
has been equipped with a slightly more elegant device for this binding.
It is called the %3label%* operator. It is called thus:
←%3label[%1<identifier>;<function>]
and has exactly the effect of binding the <identifier> to the <function>.
Note that %3label%* is a special form. To include %3label%* in the LISP syntax add:
.BEGIN TABIT1(11);CENTERIT;
<function>\::= %3label%*[<identifier>;<function>]
and the sexpr translation of the %3label%* construct should naturally be:
←%3(LABEL %1translate of <identifier> translate of <function> %3).
.END
%2←Problem%1
1. Show how to change %3eval%* to handle %3label%*.
.SS(functional arguments,functional argument)
.BEGIN TURN ON "#","←";
Recall our discussion of :
%3eval[(F#2#3);((F#.(LAMBDA(X#Y)(PLUS#(EXPT#X#2)Y))))].%*
We now know this is equivalent to:
%3eval[((LABEL#F#(LAMBDA(X#Y)(PLUS#(EXPT#X#2)#Y)))#2#3);NIL]%1.
In either case, the effect is to bind the name %3f%* to the λ-expression.
Binding is also going on when %3f%* is called: we bind %32%* to %3x%*, and %33%*
to %3y%*. Acknowledging Occam, why not attempt to combine the binding processes.
For example if %3 twice[x]%* is defined to be %3plus[x;x]%*, and %3foo[f;x]%*
were %3f[x]%*, then a truly expensive way of evaluating %3twice[2]%* would appear
to be %3foo[twice;2]%*.
This will almost work; as usual %3foo%* would be considered a %2function%* by %3eval%*
and the arguments would be evaluated. We don't want the %2value%* of %3twice%*, we
want the %2name%*. So to stop evaluation, we might try %3QUOTE%*-ing %3twice%* when
we call %3foo%*; thus %3(FOO#(QUOTE#TWICE)#2)%*. %3QUOTE%*-ing is not quite
good enough, as we will see in a moment, but it would work here. %3f%* in the definition
of %3foo%* is called a %2functional argument%*; %3f%* is a λ-variable, but appears
in the body of the function where we expect a function. You, of course,
should realize by now that %2all%* function names appearing in our LISP expressions
are variables; hopefully they are bound to %2something%* (primitives or λ-expressions)
before we attempt to evaluate them.
If such baroque examples were the limit of functional arguments, their usefulness
would be questionable, however they are both a powerful programming tool and their
implementation sheds much light on programming language design (see {yonss(P3)}).
In particular, most implementations of LISP include a very useful class
of ⊗→mapping functions↔←.
.BEGIN INDENT 0,10;
⊗→%3maplist%*↔← is a function of two arguments, a list, %3l%*, and a functional
argument, %3fn%*. %3maplist%* applies the function %3fn%* (of one argument)
to the list %3l%* and its %3cdr%*s until %3l%* is reduced to %3NIL%*.
The value of %3maplist%* is the list of the values returned by %3fn%*.
Here's a definition of %3maplist%*:
.END
.GROUP;
←%3maplist <= λ[[fn;l][null[l] → NIL; T → cons[fn[l];maplist[fn;cdr[l]]]]]%*.
Thus:
%3←maplist[REVERSE;(A B C D)] = ((D C B A)(D C B)(D C)(D))%*.
.APART;
.BEGIN INDENT 0,10;
⊗→%3mapcar%*↔← is a similar function, applying %3fn%* not to %3l%* but the
%3car%* of %3l%*. Thus:
.END
.GROUP
.P34:
←%3mapcar <= λ[[fn;l][null[l] → NIL; T → cons[fn[car[l];mapcar[fn;cdr[l]]]]].%*
and an example:
←%3mapcar[(LAMBDA(X)(CONS X NIL));(A B C D)] = ((A)(B)(C)).%*
.APART
To see why %3QUOTE%* is not sufficient, consider the following strange example:
.BEGIN TABIT2(10,29);
%3\tester <= λ[[l;fn]\[null[l] → NIL;
\\ atom[l] → fn[ ];
\\ T → tester[car[l];(LAMBDA()(TESTER(CDR L) FN))]]]%*
.END
Notice %3fn%* is a functional argument and we are %3QUOTE%*-ing the actual
parameter bound to %3fn%*
in the recursive call on the function, %3tester%*.
Now if we call %3tester%* with: %3tester[(A#(B)#(C));(LAMBDA()(BAZ#(QUOTE#A))]%*
the following occurs:
%21.%* %3l%* is bound to %3(A (B) (C))%*; %3fn%* is bound to
%3(LAMBDA()(BAZ#(QUOTE#A)))%*.
%22%*. In the conditional expression of %3tester%* p%41%* and p%42%* are false
and thus we evaluate the "otherwise" branch.
%23%*. We rebind %3l%* to %3A%*, rebind %3fn%* to %3(LAMBDA()(TESTER(CDR#L)#FN))%*
and lose big.
%3null[l]%* is false, but %3atom[l]%* is true so we evaluate %3fn%*.
When we attempt to evaluate %3cdr[l]%* we get the wrong binding of %3l%*.
What we find bound to %3l%* is %3A%*, rather than %3(A#(B)(C))%* which we
expected.
The environment at the point of call in e%43%* of the body of %3tester%* should
have been associated
with %3fn%* when we rebound %3fn%*. That is, we must associate the
symbol table which was current at the %2point of call%* with the functional argument.
This solution might seem a bit drastic, but it is easy to construct
counterexamples to less sophisticated solutions.
What does this say about functions? We have already remarked that functions are
parametric values; to that we must add: functions are also tied to the environment
in which they were created; they cannot be evaluated %3in vacuo%*.
The device LISP used to associate environments with functions is called
the ⊗→%3FUNARG%*↔← hack. (More is said about %3FUNARG%* in {yonss(P3)}.)
Instead of designating an instance of a functional argument by the %3QUOTE%*
operator, we introduce a new operator called ⊗→%3FUNCTION%*↔←. Thus:
←%3function[λ[[]tester[cdr[l];fn]]] %*or,
←%3(FUNCTION(LAMBDA()(TESTER (CDR L) FN))).%*
When %3eval%* sees the construction %3(FUNCTION %1fn%3)%* it returns as value
the list:
←%3(FUNARG###%*fn### current-symbol-table%3)%*.
When %3eval%*, or actually %3apply%* sees %3(FUNARG %*fn st%3)%*, that is,
when we are %2calling%3 fn%1, we use the symbol table %3st%*, rather than
the current symbol table for accessing global variables. It should be
apparent that %3(QUOTE ...)%* and %3(FUNCTION ...)%* will have the
same effect if %3fn%* has no global (or free) variables.
This seems like a lot of work to allow a moderately obscure construct to appear in a language.
However constructs like functional arguments appear in most programming languages
under different guises, but in LISP the problem %2and the solution%* appear
with exceptional clarity. Also functional arguments may be exploited
to describe very general control structures.
←%2Problems%*
I. What changes should be made to the LISP syntax equations to
allow functional arguments.
II. Use %3foo%* to define a function which computes factorial without
using %3label%* or explicit calls on the evaluator.
.END
.SS(special forms,special form,P8:)
We have remarked that the evaluation scheme for LISP functions is
call-by-value and, for functions with multiple arguments, left-to-right
evaluation of arguments. We have also seen, in %3QUOTE%* and %3COND%*,
that not all forms to be evaluated in LISP fall within this category.
We have already noted on {yon(P44)} that %3QUOTE%* and %3COND%* are
%2not%* translates of functions. Clearly we don't evaluate the arguments to
%3(QUOTE ...)%*; indeed the purpose of %3QUOTE%* was to %2stop%* evaluation.
Also the `arguments' to %3COND%* are handled differently; their evaluation was
handled by %3evcond%*.
We therefore have called %3QUOTE%* and %3COND%* %2special forms%*.
We would like to discuss special forms as a generally useful technique.
Consider the predicates, ⊗→%3and%*↔← and ⊗→%3or%*↔←. We might wish to define
%3and%* to be a binary predicate such that %3and%* is true just in case
%2both%* arguments evaluate to %3T%*; and define %3or%* to be binary
and false just in case both arguments evaluate to %3NIL%*.
Notice two points. First, there is really no reason to restrict these predicates
to be %2binary%*. Replacing the words "binary" by "n-ary" and "both" by "all"
in the above description has the desired effect.
Second, if we evaluate the arguments to these predicates in some order,
say left-to-right, then we could immediately determine that %3and%*
is false as soon as we come across an argument which
evaluates to %3NIL%*; similarly a call on %3or%* for an arbitrary number
of arguments can be terminated as soon as we evaluate an argument giving
value %3T%*.
But if we insist that %3and%* and %3or%* are %2functions%* we can
take advantage of neither of these observations. Functions have a fixed
number of arguments, all of which are evaluated. The solution should be
clear: define %3and%* and %3or%* as special forms and handle
the evaluation ourselves. Presently, the
only way to handle special forms is to make explicit modifications
to %3eval%*. This is not terribily difficult (or desirable). When
we see a more realistic version of %3eval%* and Co. in {yonss(P30)}, we will have
a simple way to add such forms. See also {yon(P54)}.
Recognizers for the predicates must be added to %3eval%*:
.BEGIN CENTERIT;SELECT 3;
←eq[car[e];AND] → evand[cdr[e];a];
←eq[car[e];OR] → evor[cdr[e];a];
←.....%1 where:
.END
.BEGIN SELECT 3;TABIT1(16);
.P53:
evand <= λ[[l;a]\[null[l]→ T;
\ eval[car[l];a] → evand[cdr[l];a];
\ T → NIL] ]
evor <= λ[[l;a]\[null[l] → NIL;
\ eval[car[l];a] → T;
\ T → evor[cdr[l];a]] ]
.END
Notice the explicit calls on %3eval%*. This is expensive, but cannot be helped.
Later we will show a less costly way to handle those "non-functions" which
which only to have an indefinite number of arguments, all of which are
to be evaluated (see {yonss(P18)} on macros).
←%2Problems%*
I What is the difference between a special form and call-by-name. Can call-by-name
be done in LISP (without redefining %3eval%*).
.SS(The %3prog%*-feature,,P37:)
.TURN ON "←";
%1
Recursion seems a bit supernatural, but it is legal, mechanizable
and rather cool.
There is another similar ⊗→bind↔←ing process occurring in LISP. It is
connected with an iterative style of LISP programming called the
⊗→%3prog%*↔←-feature.
First a general disclaimer: Some pure ("For my next trick I'll walk
on the water") LISP programmers feel that there is something slightly
obscene about writing iterative style programs. This isn't quite
true, but there are some good reasons for emphasizing recursion.
%21.%* Anyone can write an iterative scheme. Recursion is a powerful
tool and very possibly a new programming technique
for you. You should exercise your skill and resort to the %3prog%*
as a last resort.
%22.%* Two of the usual trappings of iteration are %2labels%* and %2go-to%*'s.
These are truly tools of the devil. In recent years several studies
by reasonable men have shown that algorithms which resort to labels
and gotos tend to be harder to read, harder to modify, sloppy in their
analysis of the process being coded, and generally ugly. The label
and goto hack invites patching over poor analysis instead of
reanalyzing the problem.
%23.%* With the control of the loop structure (either recursion or some
kind of controlled iteration, e.g., a FOR or WHILE statement) in the
hands of the language rather than in the hands of the programmer, the
static text and the dynamic flow of the execution have a parallel
relationship. This should have a beneficial effect for studies
investigating the provablilty of properties of programs.
Now that this has been said, here's our discussion of %3prog%*s.
Many algorithms present themselves more naturally as iterative
schemes. Recall the recursive computation of the length of a list given
on {yon(P19)}. %3length%* seems inordinately complex; our sympathies
lie more with %3length%41%1. Even this is not as straightforward as you would
expect for such a simple calculation. Rather, consider the following:
%21.%* Set a pointer to the given list. Set a counter to zero.
%22.%* If the list is empty, return as value of the computation, the
current value in the counter.
%23.%* Otherwise, increment the counter by one.
%24.%* Set the pointer to the cdr of the list.
%25.%* Go to line 2.
The new ingredients here are:
%21.%* An ⊗→assignment↔← statement: "set a ...".
%22.%* Access to some new cells: the pointer, the counter.
%23.%* Sequencing (albeit usually implied) between statements: line %21%*, then line %22%*...
%24.%* Gotos and labels.
%25.%* An explicit exit from the procedure: line %22%*.
Here is the LISP %3prog%* version of the length function:
.BEGIN TABIT2(15,17);SELECT 3;GROUP;TURN OFF "←";
.P45:
⊗→length↔← <= λ[[x]prog[[l;c]
\\l ← x;
\\c ← 0;
\a\[null[l] → return[c]];
\\l ← cdr[l];
\\c ← c+1;
\\go[a]] ]
.END
A few points should be made: The ⊗→%3prog%2-variables%1↔←, %3l%* and %3c%1, are ⊗→local variable↔←s.
That is, they only have meaning within this definition of %3length%*. If they
appeared in some program which called %3length%*, then the right thing
would happen; the old values would be saved (like λ-bindings) and then
restored after the call on %3length%* has been completed. Among other things,
this allows %3prog%*s to be used recursively.
Though assignment statements are normally executed for their effect on the
environment -- they have side-effects, assignments in LISP also have a value.
The value of an assignment is the value of the right-hand-side.
Conditional expressions have a slightly different effect inside %3prog%*s. If
none of the predicates in the conditional are true, then the statement following
the conditional is executed.
Labels are clear; they are identifiers.
⊗→%3go%*↔← is a little more complicated. %3go%* is a special form. If the argument
to %3go%* is %2atomic%* then it is interpreted as a (local) label; otherwise, the argument
is %2evaluated%* and the result of the evaluation is interpreted as a label.
This results in a nice programming trick. Let %3l%* be a list of dotted pairs,
each of the form, %3(%* object%4i%* . label%4i%3)%1. At each label%4i%* we
begin a piece of program to be executed when object%4i%* has been recognized.
Then the construct:
.P25:
←%3go[cdr[assoc[x;l]]]%*,
can be used to "dispatch" to the appropriate code when %3x%* is one of the
object%4i%*. This is an instance of %2⊗→table-driven↔←%* programming.
The ⊗→%3return%*↔← statement evaluates its argument, and returns that value to
the calling program.
When a %3prog%* is entered the %3prog%*- (local) variables are initialized to %3NIL%*.
The body of the %3prog%* is a sequence of statements and labels. Statements are
executed sequentially unless a %3go%* or %3return%* is evaluated.
If a %3prog%* runs out of statements then %3NIL%* is returned.
Returning from a %3prog%* unbinds the local variables.
Now to the problem of translating %3prog%*s into a s-expression representation.
We need be a little careful. First notice that %3prog%* is a %2⊗→special form↔←%*
(like %3COND%* and %3QUOTE%*). The construct:
←%3prog[[l;c]....]%* will be translated to:
←%3(PROG(L C) ...)%*.
But %3PROG%* is not the translate of a function
any more than %3(QUOTE ...)%* or %3(COND ...)%* is.
So the body of the %3prog%* must be handled specially by a
new piece of the evaluator.
.TURN OFF "←";
Similarly we must be careful about the interpretation of ←. First,
we will write %3x ← y%* in prefix form: %3←[x;y]%*. Now, using our usual LISP
technique we might map this to:
.BEGIN CENTER
%3(SET X Y).%* Is ← or ⊗→%3SET%*↔← a function in the usual LISP sense?
.END
Not likely; for if %3x%* and %3y%* have values %32%* and %33%*, for example, then the
call-by-value interpretation of %3←[x;y]%* would say %3←[2;3]%*. This was not our
intention, hopefully. What do we want. We want to evaluate the second argument
to ← while stopping the evaluation of the first argument. But LISP does
have a device for stopping evaluation: ⊗→%3QUOTE%*↔←. So we can define %3SET%* as a normal
LISP function, provided we
.BEGIN CENTER
translate %3x ← y%* as %3(SET (QUOTE X) Y)%*.
.END
.TURN ON "←";
For example, look at the evaluation of %3(SET (QUOTE X)(PLUS X 1)).%*
Assume the current value of %3X%* is %32.
SET%* is a function; evaluate the arguments from left to right. The
value of %3(QUOTE X)%* is %3X%*; the value of %3(PLUS X 1)%* is %33%*; assign %33%* to %3X%*.
Question: what does %3(SET Z (PLUS X 1))%* mean? Well if the current value of
variable %3z%* (represented by %3Z%*) is an identifier (non-numeric atom), then
%3(SET Z (PLUS X 1))%* makes sense. Assume the current value of %3Z%* is %3A%*, then
the effect of the %3SET%* statement is to assign the value %33%* to %3A%*.
Normally when you are making assignments, you want to assign to a %2name%* and
not a %2value%*; thus you will be using the form
←%3(SET (QUOTE ...) ...).%*
As a convenience an abbreviation, ⊗→%3SETQ%*↔←, has been made available:
←%3(SETQ ... ... )%* means %3 (SET (QUOTE ...) ...).%*
Again note that %3SETQ%* is not (the translate of) a function. It may be
defined as a special form or consider as a notational convenience like %3list%*.
Here is a translation of the body of the %3prog%* version of %3length:%*
.GROUP
%3
.BEGIN TABIT3(10,17,20);
\(LAMBDA(X)
\\(PROG(L C)
\\\(SETQ L X)
\\\(SETQ C 0)
\\A\(COND ((NULL L)(RETURN C)))
\\\(SETQ L(CDR L))
\\\(SETQ C (ADD1 C))
\\\(GO A) ))
.END
.APART
%1
What we are leading up to is the development of sufficient background
so that the brave few may run programs on the machine.
.REQUIRE "PROB7.PUB" SOURCE_FILE;
.SEC(Running on the machine)
%1
The %2programming language%*, LISP, is the Sexpr translation
of the LISP functions and %3prog%*s that we have been writing. What are
some of the implications of writing in Sexpr form?
First, LISP becomes the world's largest parenthesis sink. It makes
for difficult reading at times, but there are formatting tricks, and
editing programs which will help the user match parens and locate
parts of the program. (It only hurts for a little while). There is
a practical benefit which greatly outweighs this anguish factor:
since proper input to LISP programs are Sexprs and since we are writing
LISP programs in Sexpr form then on input, data and progam are
indistinguishable in format; i.e., the are both binary trees.
Obviously for evaluation, programs must have a very special structure,
but program and data are both trees just as in more hardware machines
the contents of locations which contain data are indistinguishable
in format from locations which contain instructions.
On the bare machine it is the way in which a location is accessed
which determines how the contents of that location are interpreted.
If the central processor accesses the location via the program
counter, the contents are assumed to be an instruction. That same
location can usually be accessed as part of a data manipulating
instruction. In that case, the contents are simply taken to be data.
LISP is one of the few high-level languages which also has this property.
It gives the programmer exceptional power. Let's complete the
analogy. The central processor here is our old friend, ⊗→%3eval%*↔←.
If %3eval%* references a binary tree via its `program counter', then
that tree is decoded via the internals of %3eval%*. If a tree is
referenced as input to an Sexpr-massaging function, then it is
taken as data. As a result, a LISP program can %3cons%*-up a new
Sexpr which can then be evaluated (i.e., interpreted as an intruction).
The simplest way to communicate with such a machine is to read an
sexpr translate of a LISP expression into memory, evaluate the expression, and
print the result. Several implementations of LISP use a slight variant
of this called the "read-eval-print" loop:
←%3prog[[]##a##print[eval[read[];NIL]];go[a]]%*.
.BEGIN TURN ON "#";
Since programming is done using the sexpr translation of LISP functions
it becomes convenient to sometimes say "...#the function %3CAR%*#..." or
"...#write a %3PROG%*#...",#... when we actually mean %3car%* and %3prog%*,#....
The actual LISP functions are written in the language generated by syntax equations
which we have been including. This language is called the meta-language, and
the LISP expressions called ⊗→M-expressions↔← or M-exprs.
The distinction between Mexprs and their Sexpr translations must not be forgotten.
.END
Though %3eval%* is the equivalent of the basic Cpu of the ⊗→LISP machine↔←,
there is another artifact in many versions of LISP to communicate
with the outside world. As with most pieces of I/O equipment,
it leaves something to be desired. Its name is %3evalquote%*.
.SS(%3evalquote%*,%3evalquote%*)
The basic structure of the %3evalquote%*-loop is:
%21.%* read in a (Sexpr representation of) function . I.e., a name
or a lambda expression.
%22.%* read in a list of (evaluated) arguments to be applied to the function.
%23.%* apply the function (of step %21.%*) to the argument list.
%24.%* print the value.
%25.%* go to step %21.%*
Thus the structure of the loop is essentially:
.BEGIN TABIT2(24,27);TURN OFF "←";
%3
.GROUP
prog[[fn;x;z]
\l\fn ← read[ ];
\\x ← read[ ];
\\z ← evalquote[fn;x];
\\print[z];
\\go[l]]]
%1where: %3evalquote <= λ[[fn;x]apply[fn;x;NIL]]
.APART
%*
or more concisely:
%3
.GROUP
\prog[[ ]
\ l print[evalquote[read[ ];read[ ]]];
\ go[l]]
.APART
%*
.END
.GROUP
where:
%21.%* the function, ⊗→%3read%*↔←, negotiates with an input device to read an
Sexpression.
%22.%* the function, ⊗→%3print%*↔←, prints the value of its argument on an output
device.
.APART
The details of %3read%* and %3print%* will appear when we discuss implementation
of LISP.
.BEGIN TURN ON "#";
Here's an example of the behavior of %3evalquote%*. Assume that the input
device contains %2CAR#((A#B))%*. Then the first %3read%* in %3evalquote%* gets
%2CAR%* (a string representing an Sexpr), and the second read gets %2((A#B))%*;
then %3apply%* gets called as:
%3
←apply[CAR;((A B));NIL].
%*
%3apply%* says that this is the same as evaluating
%3car[(A#B)]%*, and thus returns %3A%* as its answer, which is dutifully
printed by %3print%*.
.END
So %3evalquote%* can be used as a `desk calculator' for LISP.
If we wish to evaluate an expression %3f[a%41%*;#...#a%4n%*]%1 for
%3a%4i%1 constants, i.e. sexprs, then %3evalquote[F;#(a%41%*#...#a%4n%*)]%1
will do it for us. But the %3evalquote%*-loop will %2not%* evaluate arguments;
the a%4i%1's in the call on %3evalquote%* are not expressions, not translates
of constants, but are constants.
Typing %2CONS#((CAR#(QUOTE#(A)))#(QUOTE#(B)))%* to the %3evalquote%*-loop
will result in %3((CAR#(QUOTE#(A)))#.#(QUOTE#(B)))%*.
The %3evalquote%*-loop is an anomaly. It does not expect the usual representation
of a LISP form say %3(F e%41%* ... e%4n%*)%1 where the %3e%4i%1's
are to be evaluated. It expects a %2pair%* of sexprs representing a function
and %2evaluated%* arguments. The benefits of having "functional-notation" as
input and having arguments implicitly evaluated are greatly outweighed
by the confusion introduced by having two formats for LISP expressions, one for
the "top-level" and a different one everywhere else.
Certainly
before we can do any reasonable amount of calculation, we must be
able to define and name our own functions. The %3label%* operator, or
calling %3eval%* with an intial symbol table containing
our defintions will suffice, but this is not particularly elegant.
We would like our definitions to have some permanence.
A function (actually a %2special form%*, if you have been paying attention)
named ⊗→%3define%*↔←, whose effect is to add definitions to %3apply%*, has been
provided. The actual behavior of %3define%* will appear in the sections on
implementation.
.P51:
%3define%* is the name of a special form (its arguments are not evaluated)
of one argument, say %3x. x %*is a list of pairs: %3((u%41%* v%41%*) ... (u%4n%* v%4n%*)).%1
Each %3u%4i%1 is the name of a function and each %3v%4i%1 is the λ-expression
definition. Actually each %3u%4i%1 and %3v%4i%1 is the %2Sexpr translation%* of the
name and the function, but you knew that anyway. The effect of %3define%*
is to put the appropriate definitions in the symbol table.
For example we might wish to define the function which reverses a list as:
%3
.BEGIN NOFILL;TURN OFF "←";
.GROUP
reverse <= λ[[x] prog[[y;z]
y ← x;
z ← NIL;
a [null[y] → return [z]];
z ← cons[car[y];z];
y ← cdr[y];
go[a];]]
%1
.APART
Then the following would make this definiton grist for the %3evalquote%* mill.
.GROUP
%3
DEFINE((
(REVERSE (LAMBDA (X)
(PROG (Y Z)
(SETQ Y X)
(SETQ Z NIL)
A(COND ((NULL Y)(RETURN Z)))
(SETQ Z (CONS (CAR Y) Z))
(SETQ Y (CDR Y))
(GO A) )))
))
%1
.APART
and subsequently:
%3REVERSE ((A B C)) %1would evaluate to: %3(C B A).%1
.END
%1
C. Weissman's LISP PRIMER is an excellent source of machine examples
and should be consulted now. Always remember that you are writing the sexpr representation
of LISP functions and expressions. Do not confuse the two.
.SS(A project: extensions to %3eval%*,%3eval%*,P67:)
.SELECT 1;
***h.s. about how wonderful LISP is for extenson***
.BEGIN TURN ON "#";
Consider a p%4i%* → e%4i%* component of a conditional expression. As
conditionals are now defined, e%4i%* must be a single expression to be evaluated.
In subsets of LISP without side-effects this is sufficient. It is, however,
convenient in practice, i.e., in LISP %2with%* side effects, to extend conditionals to include components of the
form: p%4i%*#→#e%4i1%*; ... e%4in%*.
This extended component is to be evaluated as follows: if p%4i%* is true, then
evaluate the e%4ij%*'s from left to right, with the value of the component to be
the value of e%4in%*.
For example, this feature, used in %3prog%*s would allow us to replace:
.BEGIN TABIT2(10,14);
\\ ....
\\[p%41%* → %3go[l]%1]
\\ ...
\%3l\%1e%41%*;
\\e%42%*;
\\ ...
\\%3go[m];
.END
with: [p%41%* → e%41%*;e%42%*; ... %3go[m]%*;]. This is certainly easier to read.
This extended conditional expression is available on versions of LISP 1.6 on the PDP-10.
Here is another cosmetic. Instead of requiring that the body of a λ-definition
be a single expression: %3λ[[#...#]f[#...#]]%*, allow bodies of the form:
%3λ[[#...#]f%41%*[#...#];#...#f%4n%*[#...#]]%1. The evaluation of such
should mean; bind as usual, evaluate the %3f%4i%1's from left to right, returning
as value %3f%4n%*[#...#]%1.
This next extension to %3eval%* was derived from the syntax of ⊗→MUDDLE↔←, ⊗→CONNIVER↔←, and
⊗→MICRO-PLANNER↔←.
We have seen that LISP calling sequences are of two varieties: functions calls,
evaluating %2all%* of the arguments; and special forms, evaluating %2none%* of the
arguments.
In an attempt to generalize this regime we might allow the evaluation of some
of the arguments and enlarge on the domain
of objects which can appear in the list of λ-variables.
We might partition the formal parameters into obligatory parameters, optional
parameters, and an excess collector.
Obligatory parameters %2must%* have corresponding actual parameters; optional
actual parameters are used if present, otherwise declared default
values are used; and
if there are more actual parameters than the formals encompassed by the first
two classes, then they are associated with the excess collector.
To be more precise consider the following possible BNF equations:
.BEGIN TABIT1(13);TURN OFF "←";
<varlist>\::=[<obligatory> <optional> <excess>]
<obligatory>\::= <par>; ...<par> | %cε%*
<optional>\::= "optional" <opn>; ... <opn> | %cε%*
<excess>\::= "excess" <par> | %cε%*
<par>\::= <variable> | @<variable>
<opn>\::= <par> | <par> ← <form>
.END
The associated semantics follows:
.BEGIN INDENT 0,10;TURN OFF "←";
%21.%* The formal parameters are to be bound to the actual parameters from
left to right (as usual).
%22.%* There must be an actual parameter for each obligatory parameter, and
if there is no excess collector there may not be more actual parameters than
formals. (There may be less if we have optionals.)
%23.%* If a <variable> in a formal parameter is preceeded by a "@", then
the corresponding actual parameter is %2not%* evaluated.
%24.%* We might run out of actual parameters while binding the optionals.
If we do, then we look at the remaining formal optionals.
If a formal is simply a <par> then bind it to %3NIL%*; if a formal is
@<variable> ← <form> then bind the <variable> to the <form>; or if the formal is
<variable> ← <form>, bind <variable> to the value of <form>.
%25.%* Finally, the excess collector is bound to a list of any remaining
actual parameters in the obvious way: if <par> is <variable> then form a list
of the values of the remainder; if it is @<variable>, bind <variable> to the
actual list. If there is no excess, bind to %3NIL%*.
.END
.BEGIN TURN OFF "←";
These same languages have extended %3prog%*-variables slightly, allowing
them to be initialized explicitly. If a %3prog%*-variable is atomic,
intialize it to %3NIL%*, as usual. If it is of the form <variable> ← <form>
then initialize it to the value of the <form>.
.END
.END
.BEGIN TURN OFF "←";turn on "\"; tabs 10,20;
Here are some examples:
1. In the initialization of %3length%* on {yon(P45)}, we could write:
%3...#prog[[l#←#x;#c#←#0]#....%*
2. %3list%* could now be defined as: %3λ[[%1"excess"%* x]x]%*.
3. Consider the following definition:
.BEGIN NOFILL;
.group
\%3baz <= λ[\[x;%1@%*y;%1"optional"%* z; u ← 0; %1"excess"%* v]
\\print[x];
\\print[y];
\\print[z];
\\print[u];
\\print[v]].
.APART
%1Then a call of:
%3eval[(BAZ 2 (CAR (QUOTE (X Y)) 4 5 6 7 (CAR (QUOTE (A . B))));NIL]%* would print:
%3
2
(CAR(QUOTE (X Y))
4
5
(6 7 A)%*
(and return value: %3(6 7 A)%*.
Similarly defining;
.GROUP
\%3fii <= λ[\[x;y;%1"optional"%* z; u ← 0; %1"excess"%3 v]
\\print[x];
\\print[y];
\\print[z];
\\print[u];
\\print[v]].
%1and calling:
%3eval[(FII 2 (CAR (QUOTE (X Y)));NIL]%* prints:
%3
2
X
NIL
0
NIL.%*
.END
.END
←%2Problems%*
Design simple sexpr representations of these proposed constructs.
Make these extensions to %3eval%*. How useful are these syntactic
sugarings?
.SS(A project: Syntax-directed processes,syntax-directed)
%1
.REQUIRE "NOTES.SDP" SOURCE_FILE;
.SEC(Toward better symbol tables and implementation,symbol tables)
.TURN ON "←";
%1
*******diatribe about importance of LISP for implementors****
There are some rather gross defects in our symbol table mechanism.
First, (disregarding %3define%*) we have had to resort to subterfuge
to put the definitions of our functions in our symbol table.
Using %3label%* ({yonss(P38)}), or calling %3eval%* with an initial
symbol table, only defines the function for %2that%* particular call
on %3eval%*. When we finish the computation, the definition disappears.
From a practical point of view definitions should have some permanence.
An implemented version of LISP should know a
lot more functions than the five primitives. It should have a
reasonable class of arithmetic functions, and some commonly used
Sexpr functions (like %3length, assoc, subst, equal,%* etc.) all predefined.
If we do things right we should be able to use the mechanism for
predefining functions as the tool for adding our own definitions.
Second, the table look-up mechanism of %3assoc%*, though theoretically
sound, leaves something to be desired as far as efficiency is
concerned. Since we do wish to implement LISP, we should be concerned with
efficient operations. We will see that an efficient and powerful symbol table structure
can be described quite easily in LISP.
Third, we have already seen that special forms %3QUOTE%* and %3COND%* are
not evaluated in call-by-value style.
Currently the recognizers for special forms are encoded in %3eval%*, and
when an instance of such a form is seen the argument is passed %2unevaluated%*
to a function to perform the required operations.
It would be nice to extend this flexibility
to the user. We would like to have notation for adding λ-definitions of
special forms to our symbol table.
%3eval%* would then need a way of distinguishing a
λ-expression defining a function from a λ-expression defining a special form.
Also there are other calling sequences which are interesting and useful and
would be nice to have.
The current version of %3eval%* only handles function definitions. Thus
the current symbol need only store the λ-definition and does not
need explicit information saying it is a %2function%* definition.
Fourth, conceptually we could use a more powerful mechanism. Consider
%3car[car]%* as an expression to be evaluated. Perhaps it is best just
to say the expression is ill-formed, but if you knew that %3car%* also
was currently being used as a simple variable besides being a function,
then your intuition says the first occurrence of %3car%* is a function
name and the second occurrence is the simple variable name; and if the
current value of %3car%* was say, %3(A B)%* then %3car[car]%* should evaluate to %3A%*.
That is, context tells us which occurrence of %3car%* is a function
and which is a simple variable.
However our current symbol table mechanism is not sufficient for
this interpretation. Clearly what is needed is the ability to
associate more than one property with the atom %3car%*. In the above example
%3car%* has (at least) two %6properties%* or %6attributes%*. It has a function
value and it has a simple value. Since %3car%* is a primitive function,
its function value is the location of the machine code to carry out
the %3car%* function; the simple value is the sexpr currently bound to
the variable, %3car%*.
Notice that the current table mechanism is again deficient. Using %3assoc%*
we would only find the %2first%* binding of a variable. It will either
be a λ-expression (function) or a simple value. Context will determine
how %3eval%* will interpret what is found.
These last three desires, for permanence, generality, and for multiple properties,
can be handled by the same mechanism. Clearly we could continue using an
%3assoc%*-like table, though more complicated, storing information
about what kind of value each dotted pair is. However our desire for
efficiency would certainly not be satisfied. Rather, we will
design a new symbol table which can store with each atom sequences of values and
their types. This gives us multiple properties.
We will make the table %2global%*
to %3eval%* and %3apply%*; these functions will now implicitly refer to that table
and thus need not have an explicit symbol-table argument. This
gives us permanence.
We will designate an indicator for function definition and an indicator
for special form definition and expand %3eval%* and %3apply%* to recognize
these indicators. This gives us generality.
Most implementations of LISP allow this association of
arbitrary sequences of %6attribute-value%* pairs with an atom.
It is a very good idea.
How can we associate an arbitrary collection of objects with
something? With each atom we will
associate a list called the %6⊗→property-list↔←%* or p%6-list.%* But atoms
can't be represented as just any other list. Among other things,
we must be able to recognize the occurrence of atoms so that we can
implement the predicate, %3atom%*. So atoms are special lists: the
%3car%* (or first element) of the representation of each atom is an
indicator used exclusively for the beginning of atoms.
We will use %7α%* to designate the beginning of an atom.
The %3cdr%* of the representation of the atom is the property list.
The elements of the p-list are associated in pairs. The first element is
the attribute (or property or indicator); the next element is the value.
For example, here
is part of the atom-structure for %3car%* assuming %3car%* is also being
used as a simple variable and has current value, %3(A B)%*:
.group
.GROUP SKIP 6;
.P50:
←%2Atom-structure for %3CAR%1.
.apart
The indicator, ⊗→%3SUBR%*↔←, tells us that %3car%* is a machine
language function. The indicator, ⊗→%3VALUE%*↔←, tells us that %3car%* is
also being used as a simple variable.
The reason for not pointing directly at %3(A B)%* is described
in {yonss(P5)}.
It should now be clear how
to program the primitive predicate, %3atom%*: "is %3car%* of the argument
to ⊗→%3atom%*↔← the special atom-indicator?" If it is then %3atom%* gives
value %3T%*, otherwise %3atom%* gives value %3NIL%*.
.P29:
How about the representation of non-primitive functions?
Non-primitive functions are defined using λ-expressions. What we do is
define a new indicator, ⊗→%3EXPR%*↔←, which tells
the evaluator that the function is a λ-expression. For example,
here is part of the atom-structure for %3FACT%*, the Sexpr form
of the factorial function.
.GROUP
.GROUP SKIP 15;
←%2Atom-structure for %3FACT%1.
.APART
This picture should tell you at least two things: one, that programs
are (almost) stored as binary trees, and two, it should tell you what the function,
⊗→%3define%*↔←, does. %3define%* ({yon(P51)}) puts the λ-definition under the indicator %3EXPR%*
on each of the named atoms (functions).
The world becomes much simpler if we store each atom uniquely.
This is the reason for saying "almost" above. That
is, every reference to the atom %3A%* is actually a pointer to the same
"binary tree", the binary tree whose %3car%*-part is the special atom indicator,
and whose %3cdr%*-part is the ⊗→p-list↔← for the atom %3A%*. Thus %3(A . A)%*, which
.GROUP
we have been representing as:
.BEGIN NOFILL;TURN ON "\";TABS 30;
\%7[~~~][~~~]
\%3 A A
%1
is more realistically represented as:
\%7[~~~][~~~]
\%1 to atom-structure for %3A%1.
.END
.APART
.P48:
So the internal structures of this implementation of LISP are not binary trees;
there are intersecting branches. Structures of this sort
we will call %2⊗→list structure↔←%*.
LISP thus deals generally with binary list structure.
Trees are a subset of list structures.
We will see in a moment that many of the
computations which we have been performing have also been generating
list structure.
Question: Assume we have the above dotted pair as a value of a variable %3x%*
and we wish to print the value of %3x%*.
Clearly we would hope to see "%3(A . A)%*" appear on the output device.
We would expect that the print routine, named %3print%*, would be given
a pointer to the dotted pair. %3print%* can recognize that it %2is%* a dotted
pair since %3car%* of it is not %7α%*.
But how can %3print%* distinguish %3(A . A)%* from %3(B . B)%* for example.
The pointer in the preceeding diagram will point to %3A%* in one case
and to %3B%* in the other but nothing in our representation tells us %2what%*
to print. The simplest thing to do is to store in
each atom-structure some information telling what the name of the atom is.
This is done with another indicator called ⊗→%3PNAME%*↔←, standing for
%2⊗→print-name↔←%*. Thus the atom %3BAZ%* will have at least the following:
.GROUP
.GROUP SKIP 6;
←%2Atom-structure for %3BAZ%1.
.APART
.TURN OFF "#";
The indicator is called %3PNAME%* because the print-name (or ⊗→p-name↔←)
of the atom is what the LISP output routine will print as the name of
the atom. %2BAZ##%* means a memory location
containing some encoding of the letters %2B%*, %2A%*, and %2Z%*.
The symbol, %2#%*, represents some non-printing character; thus we
are assuming a location can contain five characters.
We represent the print-name as a list so that we may allow atoms with
p-names of unbounded length. Thus the p-name for %3BLETCH%*,
would be:
.TURN ON "#";
.GROUP
.GROUP SKIP 6;
←%2P-name structure for %3BLETCH%1.
.APART
How do we get atoms stored uniquely? ⊗→%3read%*↔← does it. On reading
an atom name (which is the way almost all atoms come into existence),
the %3read%* program checks the current symbol table. If the atom does
not appear we add a new entry consisting of the p-list with a single
attribute-value pair ----#the#%3PNAME%*,#p-name#pair#---. If the atom %2does%*
appear, we bring back a pointer to the appropriate entry.
Before talking more about %3read, print%* and the other input-output functions in LISP,
we should discuss the storage of list structure. The implementation
described closely follows that for the PDP-10, though most of
the description is machine independent.
First, though Sexprs and now atoms are represented as binary list structure, there will
be quantities which we wish to represent in memory which are not
structures. For example, numeric constants, and the actual encoding of
print-names for atoms are not structures. These quantities will be stored
in an area called %2⊗→full word space↔← (⊗→FWS↔←)%*. Most of the business of
LISP is massaging of list structure and binary trees, however; such nodes, which have a
%3car%*-part and a %3cdr%*-part are stored in a separate area called
%2⊗→free-space↔← (⊗→FS↔←)%*.
The name, free space, will become apparent in a moment. Each machine
word in FS is divided into two parts; the left half contains a pointer to
the %3car%*-part and the right half contains a pointer to the %3cdr%*-part.
The pointers may point into FS or FWS. In more detail, consider this
representation of
.GROUP
←%3(A .(B . C)):%*
.GROUP SKIP 10;
←%2A representation of %3(A .(B . C))%1.
.APART
Obviously the actual addresses are irrelevant. If we replace the addresses with pointers
to the appropriate cells the resulting diagram will contain the same information.
The origin of the LISP "box notation" should be obvious.
We can now describe an implementation of %3eq%*.
Since atoms are stored uniquely, the predicate, ⊗→%3eq%*↔←,
need only check that its arguments are both atomic and both point
to the same binary tree or location in memory. In most implementations
the check for atom-ness is suppressed and a simple address comparison
is made. Non-atomic sexpessions are not usually stored uniquely. Thus
←%3eq[(A . B);(A . B)]%* is usually false, but
←%3eq[x;x] %*is usually true for any %3x%*.
Please note that we are %2not%* changing the definition of %3eq%*. The
preceding remarks deal only with the usual implementation of %3eq%*.
The implementation of %3car%* and %3cdr%* present no problem.
Simple pointer manipulation will suffice.
.GROUP
.GROUP SKIP 6;
←%2The effect of %3car%1.
.APART
.P28:
Finally, a remark about conditional expressions. Most versions of LISP
relax the behavior of the predicates, p%4i%*, appearing in conditional exressions
such that instead of testing for the values %3T%* and %3NIL%* we
test only for %3NIL%* and non-%3NIL%*. That is, anything other than %3NIL%*
satisfies the p%4i%*. This allows such coding tricks as:
.BEGIN SELECT 3;TABIT1(30);TURN OFF "←";
\[x ← cdr[x] → ...]
%1which is equivalent to:%*
\x ← cdr[x];
\[not[null[x]] → ...]
.SELECT 1;
(Recall that the value of "←" is the value of the RHS.)
.END
As with most coding tricks, they should be avoided like the plague.
Again, please note that this discussion is not a change to our definition
of conditional expression, but an implementation dependent feature.
.<<pic of NIL>>
.SKIP TO COLUMN 1;
.SS(A picture of the atom %3NIL%*,%3NIL%*)
We have been writing the atom %3NIL%* as:
.GROUP SKIP 12;
Where the atoms for %3PNAME%* and %3VALUE%* are represented as:
.GROUP SKIP 12;
More realistically we should represent %3NIL%* as:
.NEXT PAGE;
.SS(A first look at %3cons%1)
%1
The effect of the ⊗→%3cons%*↔← function is quite different from that
of the other primitive LISP functions. The other functions (or
predicates) manipulate existing Sexpressions, whereas %3cons%*
must construct a new sexpression from two existing sexprs.
That is, given representations of two sexprs, say %3x%* and %3y, cons[x;y]%*
is to get a new cell, put the representaion of %3x%* in the %3car%*-part of
the cell and the representation of %3y%* in the %3cdr%*-part:
.BEGIN TURN ON "\";NOFILL;TABS 10,30,45;
.GROUP
result of %3cons[x;y]%*
\\%7[~~~~~][~~~~~]
\ %1rep. of %3x\\ %1rep. of %3y%1
.APART
.END
Where is %3cons%* to obtain these new cells? This is accomplished by the ⊗→free
storage area↔←. Within the free storage (FS) area resides all the cell
which have a %3car%*- and %3cdr%*- part. The cells which contain the actual
p-names we know reside in the full word space (FWS). At any point
in a LISP computation there are cells in the FS are which do not contain
any parts of the user's computation. In particular, when the computation
is begun, only the atom structure for the initial LISP symbol table
uses cells in the FS area. The remaining FS cells form a %2free-space list.%*
Whenever a %3cons%* needs a cell the first cell in the FS list is used and
the FS list is set to the %3cdr%* of the FS list.
As the computation continues, more and more cell are taken from the FS list.
When a %3cons%* operation asks for a cell and the FS list is empty, the
computation is suspended and the %2⊗→storage reclaimer↔←%* or %2⊗→garbage collector↔←%* is
called.
.SS(Garbage collection,garbage collection)
During the course of a computation, contents of cells which were taken
from the FS list often become unnecessary. For example during the evaluation
of something as simple as:
←%3(CONS (QUOTE A)(QUOTE B)),%* many cell are used:
%21.%* At least seven cells are needed just to read in the expression.
If some of the atoms are not present in the symbol table,
more cells will be needed.
%22.%* One cell will be needed to perform the %3cons%* operation.
After the computation is completed, none of the eight mentioned cells are needed.
They are garbage.
Why not simply return these `garbage cells' explicitly to the FS list?
Frequently it is very difficult to know just exactly which cells
%6are%* garbage. Indeed, experiments have been performed in which LISP
programmers were allowed to return `garbage' to the FS list themselves.
The results were disasterous; list structure, thought to be garbage
was returned to the FS list by the programmers, unaware it was still being
used by other computations. We will see where these difficulties arise
later.
Thus reclamation is passed to the LISP system. The %3cons%* function and
its FW space counterpart, %3fwcons%*, are the only functions allowed
to mainpulate the free storage lists. When either list becomes empty, the
garbage collector is called.
%2←The fundamental assumption behind garbage collection is:%*
.NARROW 10,10;
At any point in a LISP computation, all cells which contain
parts of the computation are reachable (by %3car-cdr%* chains)
through a fixed set of known cells or registers.
.WIDEN;
The first phase of the garbage collector, called the %2⊗→marking phase↔←%*,
marks all of the list structure which is currently active (reachable
through the above mentioned fixed cells). This should include all the
atoms in the symbol table and all the cells reachable by %3car-cdr%* chains
from any of these atoms. Also any partial computations which have been
generated must also be marked. Once all the active structure has been
marked, we proceed to the second phase, the %2⊗→sweep phase↔←.%*
Once the marking has been completed, we go linearly (sweep) through memory,
collecting all those cells which have not been marked. Again,
the assumption is that if cells are not marked in the first phase,
then they do not contain information necessary to the LISP computation.
These unmarked cells are chained together via their %3cdr%*-parts to form
a new FS list. The FS pointer is set to the beginning of this list;
similarly, the unmarked cells in FWS comprise the new FW list.
A more detailed discussion of this garbage collector
will be given when examine the details of implementation in {yonss(P26)}.
And a more complex algorithm is discussed in {yonss(P27)}.
Garbage collection appears to be a very complex and time-consuming
process. Indeed it is. As indicated above, there are alternatives in some
applications. If the data-structure manipulations are particularly simple
then leaving storage management in the programmer's hands might
suffice. However, LISP generates very intertwined structures.
Perhaps there is
another way to allow the system to handle storage management.
First notice that storage management becomes quite simple if there is no sharing
of sublists. The question of when to return a list to the free space list
is easily solved. However it is desirable to share substructure quite often.
Instead of using a garbage collector, we might associate a counter, called
a %2⊗→reference counter↔←%*, with each
list when it is built. In that counter we will store the number of references
to that list. Thus the counter will be initialized to 1 when the list is created.
Whenever the list
is shared we increase the counter by 1; whenever the list is no longer to
be shared by some list we decrease the counter by 1.
When the count goes to 0 we can put the cells of the list in the free space list.
One of the most glaring defects in this reference counter scheme is the
prohibition of circular lists. Consider the following sequence:
.BEGIN NOFILL;GROUP;TURN OFF"←";
%21.%* Manufacture a list,%3x%*: %3x ← (B I G M O T H E R L I S T)%1. Reference count is 1.
%22.%* Circularize it: %3x ← circle[x];%*. Reference count is now 2.
%23.%* Delete all references to %3x%*: %3x ← NIL%*. Reference count reverts to 1,
but the list is not referred to, is not on the free space list, and has thus
been lost to the system.
.END
←%2Problems%*
I This problem deals with what is known in LISP as %2⊗→hash consing↔←%*.
We have been storing atoms uniquely, but it should be clear from the behavior
of %3cons%* that non-atomic sexprs are %2not%* stored uniquely.
Certainly storing single copies of any sexpr would save space. For example,
the non-atomic structure of
%3((A . B).(A . B))%* could be represented with two cells rather than three.
Unique storage is not without its difficulties. What problems do you forsee
in implementing such a scheme?
II We said on {yon(P48)} that many LISP computations generate list structure
rather than true binary trees? Give an example.
.P55:
II Can you write a LISP function %3circle <= λ[[x] ... %* which will take
a list %3x%* and make it circular. Thus:
.BEGIN TABS 8,30,47,65;NOFILL;TURN ON "\";GROUP;
%7
\[~~~~~~~[~~~]\[~~~]~~~]\[~~]~~~]\[~~~~]~~~]
%3\NOTHING\ CAN\ GO\ WRONG
.END
This list is circular on its "last two" elements.
Needless to say printing such structures is not appreciated.
.SS(%3read%* and %3print%*,,P13:)
.TURN ON "←";
When you begin to implement LISP you find that a very significant
part of LISP can be written LISP. We have already seen that the
evaluation process is expressible this way.
Here we will see that the majority of the I/O routines can be
written as LISP functions calling a very few primitive routines.
The primitive routines can also be described in `pseudo-LISP'.
That is, we will give a LISP-like description of them, though
they would normally be coded in machine language.
The primitive functions are ⊗→%3ratom%*↔← (read an atom) and ⊗→%3prin1%*↔←.
.BEGIN INDENT 0,10;
.GROUP
%3ratom[ ]%* is a function of no arguments. It reads the next atom
or special character (left paren, right paren or dot).
It looks up that object in the symbol table and returns
a pointer to that symbol table entry. If no entry
is found an appropriate entry is made.
%3ratom%* is the LISP ⊗→scanner↔←. %3ratom%* flushes spaces and commas,
using them only for delimiters. It returns only atoms or special characters
to %3read%*, the LISP ⊗→parser↔←.
.APART
.GROUP
%3prin1[x]%* is a function of one argument expecting an atom, left paren
right paren, blank, or dot as the value of its argument.
It will print the p-name of that object on the output
device.
.APART
.END
To make life simpler we need to be able to refer to atoms whose values are
the characters "%3)%*", "%3(%*", ".", and " "(blank).
We will assume that ⊗→%3rpar%*↔←, ⊗→%3lpar%*↔←,
⊗→%3period%*↔←, and ⊗→%3blank%*↔← are such atoms. For example, if the next input
character is "(" then
←%3eq[ratom[ ];lpar]%* is true (and the input pointer is moved to
the next character!).
%3prin1[period]%* will have the effect of printing a %2"."%* on the output
device.
We will discuss the structure of %3ratom%* and %3prin1%* in a moment. In
the meantime here are %3read%* and %3print%*:
.BEGIN TABIT2(17,20);TURN OFF"←";SELECT 3;
.GROUP
⊗→%3read%*↔← <=λ[[ ]prog[[j]
\j ← ratom[ ];
\[eq[j;lpar] → return[read1[ ]];
\ or[eq[j;rpar];eq[j;period]] → LOSEBIG1;
\ T → return[j]]]]
.APART
.GROUP
⊗→%3read1%*↔← <= λ[[ ]prog[[j;k]
\\j ← ratom[ ];
\\[eq[j;lpar] → return[cons[read1[ ];read1[ ]]];
\\ eq[j;rpar] → return[NIL];
\\ eq[j;period] → go[rd];
\\ T → return[cons[j;read1[ ]]] ];
\rd\k ← ratom[ ];
\\[eq[k;lpar] → k ← read1[ ];
\\ or[eq[k;rpar];eq[k;period]] → LOSEBIG2];
\\j ← ratom[ ];
\\[eq[j;rpar] → return[k];
\\ T → LOSEBIG3] ]]
.APART
.END
Here's %3print%* and friends. %3terpri%* gets a new output line.
(note: the value of %3print[x]%* is %3x%*.)
.BEGIN TABIT2(17,20);TURN OFF "←";SELECT 3;
.GROUP
⊗→%3print%*↔← <= λ[[x]prog[[ ]
\prin0[x];
\terpri[ ];
\return[x]]]
.APART
.GROUP
⊗→%3prin0%*↔← <= λ[[x]prog[[j]
\\[atom[x] → return[prin1[x]]];
\\j ← x;
\a3\prin0[car[j]];
\\[null[cdr[j]] → go[a6]]
\\prin1[blank];
\\[atom[cdr[j]] → go [p1]];
\\j ← cdr[j];
\\go[a3];
\p1\prin1[period];
\\prin1[blank];
\\prin1[cdr[j]];
\a6\prin1[rpar];
\\return[x] ]]
.APART
.END
So far we have thrown all the I/O back on %3ratom%* and %3prin1%*. Clearly
%3ratom%* will be more interesting. All %3prin1%* need do is get the p-name and
print it. %3ratom%* needs to search the symbol table (efficiently hopefully),
and if
the atom is not found, add it to the table. All %3ratom%* has to work
with is the actual character string (called %3chr-str%* in the future) which
will be the p-name of some atom. What %3ratom%* could do is look at the
p-name of each atom currently in the symbol table; when it finds a match
it returns a pointer to the beginning of that atom.
(Notice this is essentially the search scheme of %3assoc%*.)
If the appropriate
atom is not found it can build a new one consiting of the p-name, add it
to the table, and return a pointer to this new atom. This would
have the appropriate effect; each atom would be stored uniquely, but
the efficiency would leave something to be desired.
←%2Problems%*
***** non-recursive reader...better worse***
*** slashifying ***
.SS(Hashing,hashing,P14:)
Symbol table lookup is exactly the problem of looking up
words in a dictionary. The above scheme of %3assoc%* (linear search)
is analogous to a person beginning at page one of the dictionary and
proceeding linearly (page-by-page and word-by-word) through
the book until he found the word in question. Truly a losing
scheme. What we normally do is look at the first character of
word and go immediately to the subsection of the dictionary which
has the words beginning with that character. We know that if
we cannot find the defintion of our word in that subsection we need
look no further in the book. Usually we delimit our search even
further by keying on subsequent characters in the word. Finally
we may resort to linear search to pin down the word on a specific
page or column.
What we want is a similar scheme for the machine. We might in fact
mimic the dictionary search, subdividing our table into 26 subsections.
We can do better.
Since it is the machine which will subdivide and
index into the symbol table, we can pick a scheme which is computationally
convenient for the machine. Besides being convenient, the scheme should
result in rather even distribution of atoms in the subsections.
Obviously if the majority of the atoms end up in the same partition
of the table we will have gained little towards improving the
search efficiency. Now for terminology. Each of the partitions
or subdivisions of the table is called a %2⊗→bucket↔←%*. Each atom will
appear in at most one bucket. The computational scheme or function
used to determine which bucket a particular atom belongs in is called
a %2⊗→hashing function↔←%*. All ⊗→%3ratom%*↔← has to work with is %3chr-str%*, the
encoding of the actual name of the atom. So the hashing function
will use %3chr-str%* to determine which bucket to examine. If the atom
with ⊗→PNAME↔← %3chr-str%* does not appear in that bucket we are assured
that it does not appear anywhere in the table. In this case we
make up the minimal table entry --#atom#indicator, %3PNAME%*, p-name structure#--
and add it to that bucket.
There are lots of hashing functions. Here's one:
%21.%* Assume that we have N+1 buckets, numbered 0, 1, 2 ... N.
%22.%* Take the representation of %3chr-str%* (it's a number) and
divide it by N+1.
%23.%* Look at the remainder. It's a number between 0 and N.
%24.%* Use that remainder as the index to the appropriate bucket.
This is a scheme frequently used in LISP. Given the bucket number,
we then run down the list of atoms in that bucket, comparing
print-names against %3chr-str%*. If the atom is not found, %3cons%*-up
the atom and stick it in the bucket.
There is a lot of mystery and history about hashing and other means
of searching and storing in symbols. References are given in the
bibliography.
In review, here is the structure of the LISP input mechanism:
%21.%* %3read%*, the LISP ⊗→parser↔←, builds a list-structure representation
of the input string. %3read%* is defined in terms of %3ratom%*.
%22.%* %3ratom%*, the LISP ⊗→scanner↔←, builds atoms and special characters
from the input. It searchs and increments the LISP symbol table.
%23.%* The LISP symbol table, usually called ⊗→%3OBLIST%*↔← (for %2⊗→object list↔←%*),
is a list of buckets. Each bucket is a list of the atoms which `hash' to that bucket.
We actually represent the object list as an array named %3oblist%*.
This way the hash number will give us the array subscript and we can
go to the right bucket immediately.
We won't have to go "cdr-ing" down the object list to get to the right bucket.
See figure on next page.
Here is a `pseudo-LISP' version of ⊗→%3ratom%*↔←:
.BEGIN TABIT2(17,20);FILL;
%3hash%*\is a function returning the bucket number of its argument.
%3insert%*\is a function to build the atom and insert it into to bucket.
%3right-one%*\is a predicate used to check if an atom
has the right print-name.
.NOFILL
.TURN OFF "←";
.SELECT 3;
⊗→ratom↔← <= λ[[ ]prog[[z;i;bucket]
\\i ← hash[chr-str];
\\bucket ← oblist[i];
\a\[null[bucket] → return[insert[chr-str]]];
\\z ← get[car[bucket];PNAME];
\\[right-one[z;chr-str] → return[car[bucket]]];
\\bucket ← cdr[bucket];
\\go[a]]]
.END
%3get%* is a LISP system function. It is described on {yon(P52)}.
.SKIP TO COLUMN 1;
.SS(A review of the structure of the LISP machine,LISP machine)
.BEGIN TABIT2(10,20);GROUP;
\_______________________________________
\\⊗→%3eval%*↔← and friends
\\⊗→%3read%*↔← and ⊗→%3print%*↔←
\\the ⊗→garbage collector↔←
\\the base registers for marking
\\ ⊗→FS pointer↔←
\\ ⊗→FWS pointer↔←
\\ symbol table(⊗→%3OBLIST%*↔←) pointer
\\ registers for partial results
\_______________________________________
\\⊗→Free space↔←
\\ the free space list
\\ those parts of sexprs containing
\\ %3car%*- and %3cdr%*-parts.
\_______________________________________
\\⊗→Full word space↔←
\\ the full word space list
\\ atom print names
\\ numbers
\_______________________________________
.END
.NEXT PAGE;
.SS(More on symbol tables,symbol tables)
←%2Subtitled: But my Fortran program doesn't do all this crap!%1
It is quite true that a running Fortran, PL/1, or Algol program
is far less complicated as far as its symbol accessing mechanisms
are concerned. In Fortran there is a simple relationship between
variables and memory locations which will contain their values;
in Algol, there is a simple relationship between variables and positions
in the run-time stack.
In LISP, both the quality and the quantity of variables can change.
Arbitrary properties can be associated with atoms at run-time.
Indeed, the symbol table mechanism of LISP is more reminiscent of
that associated with the Fortran or Algol compiler. For these less
sophisticated languages it is the compiler which performs the mapping
from source language to running machine code. It is the compiler's
responsibility to discover the properties associated with each variable.
The compiler can do this because the semantics of the language is such
that at compile all (or almost all) properties of the variables are known.
This is not true for LISP. In general you cannot tell until run time what
the attributes of a particular atom are. The situations is really even worse
than this. Since programs and data are indistinguishable, we can %3cons%*-up a
list, assign it to a variable, then turn around and use that variable as
a function. Try that in Fortran.
Now the world isn't all this grim. There are LISP compilers (written
in LISP naturally). They can make many decisions at compile time
about the properties of variables But in general the compiled code
will be intersuprsed with calls on %3eval%*. That is, %3eval%* will have to make
some decisions at runtime, because the compiler just doesn't know what to do.
This implies that compiled and interpreted code must be able to
communicate with each other. If a piece of compiled code calls a
λ-expression or conversely, the right thing must happen. The execution
of the program should be totally transparent as to whether any, or all, or
none of the functions are compiled. This means that the calling
sequences for both kinds of functions must be compatible. Less
obvious and by far more troublesome, is the communication of global
values between functions.
.SS(Global Variables,global variables)
A variable used gobally in a function (or %3prog%*) is one which
does not appear in the λ-list or in the list of %3prog%*-variables.
For example in:
←%3λ[[x]*[x;y]], y%* is global.
Communication of global variables between interpreted functions
is not problematic: just look down the atom structure for the %3VALUE%*-
cell. Handling of global variables by compiled functions requires some
care. The ⊗→%3VALUE%*-cell↔← mechanism is designed to accomplish this.
What we will do now is:
%21.%* Introduce a graphical language, based on ⊗→AMBIT/G↔←, to describe
many of the operations of LISP ({yonss(P26)}).
%22.%* Describe parts of the dynamics of evaluation using the ⊗→Contour
Model↔← ({yonss(P31)}).
%23.%* Describe the mechanism for user defined special forms, and another
function mechanism called ⊗→macros↔← ({yonss(P18)}).
%24.%* Give a new improved definition of %3eval%* and Co., using the
new symbol table mechanism. Show how the ⊗→bind↔←ing mechanism
for λ-expression evaluation operates ({yonss(P30)}).
%25.%* Introduce a simple stack machine, %aSM%*, mapping the graphical
descriptions onto a ⊗→PDP-10↔←-like machine ({yonss(P33)}).
%26.%* Write a LISP function which will operate as a ⊗→compiler↔←, translating
the Sexpr representations of LISP functions into `equivalent' sequences
of %aSM%* code ({yonss(P32)}).
%27.%* Then we can see how the problem of global variables is handled ({yonss(P5)}).
.SS(AMBIT/G,AMBIT/G,P26:)
.SELECT 1;
AMBIT/G⊗↓an ambit is half aminal, half hobitt←
is a graphical language for the description of both
data and algorithms. This outline will cover only a few aspects
of the complete language.
.TURN ON "#";
First the data representation is a generalization of the
%2box notation%* of LISP. We have already seen that it is often
helpful to draw pictures to understand the data representation
in a particular problem. Instead of requiring that we draw all
nodes in a tree as boxes, we might convey more of our intuition
by drawing nodes of different shapes for nodes which contain
different kinds of information. We have already begun to do
this on a small scale. Words (nodes) in Full Word Space
are drawn:###%7[~~~~~~]%*### whereas nodes in free space are
drawn:###%7[~~~~]~~~~]%*### even though as far as most machines are
concerned both kinds of nodes mape into the same kind of memory cell.
.TURN OFF "#";
AMBIT/G exploits this idea of shapes. This is interesting but in
itself gives us no new power. The innovation is that we can
also describe our algorithms as graphical transformations of the
dtat graphs. The basic statements of the language are %2pattern-match-
and-replacement%* rules. That is, if such and such pattern can be
found in the current state of the data graph, then replace it with
a new pattern. The only kind of replacement allowed is the %2swinging%*
of an arrow so that its head moves from one node to another. The
tails of the arrows are fixed (as are the number of data nodes
and arrows). The individual atatements are combined into an
algorithm by success and failure conditions. If a pattern-match-
replacement is successful then we go to one AMBIT/G statement,
if the match fails then we go to another statement.
The other two components of an AMBIT/G program are declaration
and initialization. We declare the shapes which will be used in
the algorithm and declare the number and relative positions of the
arrows which can radiate from each type of node. The initialization
(which we ussually suppress) sets the initial pattern of the data graph.
An example is long overdue:
.<<pic of AMBIT/G for car>>
.GROUP
.BEGIN NOFILL;
----------------------------------------
| ⊗→%3car%*↔← | |
|------ |
| |
| |
| %7A%4AC1%1===>%7[~~~~]~~~~]%* |
| \ // |→ F
| \ // |
| \ // |
| -→ O<== |
| |
-----------------------------------------
↓
S
.END
.APART
This algorithm is represents the action of the %3car%* routine. Arrows
which are suppressed are not needed for matching. The solid links
represent links which must be found if the rule is to succeed and the
dotted links represent links which are to be set (as solid links) if
the match succeedes. The exit labeled S is used if all goes as
planned; the exit labeled F is taken if we cannot natch successfully.
The circle is a notational comvenience: it matches any shape.
Thus in this example, %3car%* would succeed as long as AC1 points to a
cell in free space.
On following pages are AMBIT/G algorithms for ⊗→%3cdr%*↔←, ⊗→%3cons%*↔←,
⊗→%3eq%*↔←, and ⊗→%3atom%*↔←.
The following conventions pertain to these implementations:
%21.%* We assume AC1, AC2, ...,ACn are pointers pointing to the n arguments
of a function (requiring n arguments).
%22.%* We assume that every function, on completion of its action, will set
AC1 to point to the value of that function.
.TURN ON "#";
There are two basic shapes: ##%7[~~~]~~~]%*## and ##%7[~~~~~~]%*## corresponding
to free space and full word space objects. Pointers will be given a
separate shape:###%7A##%*; and when we describe the garbage collector
we will need a `marked-unmarked' shape: ##%7G%*## . One final point:
the `down arrow' on ##%7[~~~]~~~]%*## and ##%7[~~~~~~~]%*##, is usually implicit
in implementations. What does it represent?
******picture of shapes****
Notes: The right-side arrows on ##%7α~~~]%*## and ##%7[~~~]~~~]%*##
are used by the garbage collector. First we sweep from
##%7[~~~~~~]%4TOP%1# to##%7[~~~~~~~~~]%4BOTTOM%1#,##
setting thes side arrows to %7G%4NA%1#;
when we mark, we reset those nodes which are reachable to point to %7G%4A%1.
Then the final sweep phase will collect all those nodes still marked,
%7G%4NA%1## into a new free space list, pointed to by FSP .
%7α[~~~~]%4<name>%1## is the atom header; <name> need not be present.
%7A%4MKR%1## is a pointer used during the garbage collection.
%7A%4P%1## is a pointer used to save the cdr parts of lists during
the marking phase of the garbage collector.
%7[~~~~]%4TOP%1,## %7[~~~~~~]%4BOTTOM%1## are used to delimit the boundaries of
free space.
The ⊗→garbage collector↔← has been slightly simplified. We should mark
more than just ⊗→OBLIST↔← (the symbol table); for example, what
%7A%4AC1%1 and %7A%4AC2%1 point to should be marked. There are other
intermediate results which must be preserved; these will become
apparent as we proceed.
We must also be careful about the order in which the dotted lines are
`executed'; often we will number the arrows to indicate the order of
execution.
Following this introduction is the main structure of a LISP garbage
collector. The heart of the collector is the marking algorithm.
Some care need be excercised here. Since we are marking binary list
structure in a sequential manner, we need to make a decision as to
whether to mark the ⊗→%3car%*↔←-part first or mark the ⊗→%3cdr%*↔←-part first.
Actually the order in whcih we do these operations isn't important.
What is important is that while we are marking the first-chosen
branch, we must remember where the other branch is so that we
can subsequently mark it. Also since this marking process is
obviously recursive -- branches may also have branches-- we
must be prepared to remember an arbitrary umber of `other branches'.
The pointer, P, is used to help remember these pending branches.
As you examine the behavior of the AMBIT/G marking algorithm notice
that:
%21.%* We always mark the %3car%*-branch first.
%22.%* As we prepare to examine the %3car%*-part of a tree we save a pointer
to that node of the tree, using P.
%33.%* After we have marked the %3car%*-part, we use the information saved by
P to traverse to %3cdr%*-part.
P is pointing to a sequence of cells linked together by their
`down arrows'. When we go down the %3car%*-part, we save the parent-node
in the cell currently being pointed to by P. Then we set P to it's
`down arrow' successor:
.BEGIN VERBATIM; group;
←----
|
--- -------- |
( P >=>| | | (1)
--- ---------\ |
\ || \--
\ ↓↓
\----------
| |
----------
.END
As long as the %3car%*-part of the structure we are marking points
into free space, we will continue to update P. If you look back
through the elements whcih we have added to P you will see a
history of those nodes whose %3car%*-parts have been marked, but whose
%3cdr%*-parts are still to be examined. When we finally terminate the
%3car%*-marking we will pick off the last node in P, decrement P, and
then traverse that substructure:
.BEGIN VERBATIM;group;
---------
---→| |
| ---------
/ ||
/ ↓↓
--- / ---------
( P >========>| |
--- ---------
.END
Thus, P and its associated cells are behaving as a stack.
Recall that we are storing information as list structure and thus
have intersecting branches.
Now since we do have intersections, the marking process can be a little
clever. As soon as the marker comes across a previously marked
cell we know that everything below that point in the structure has already
been marked. We can then terminate the marker on that branch.
.SS(Garbage collection again,garbage collection)
.SKIP TO COLUMN 1;
←%2An AMBIT/G garbage collector for LISP
.NEXT PAGE
←%2A LISP garbage collector for LISP%*
.NEXT PAGE
.SELECT 1;
.SS(The Contour Model,Contour Model,P31:)
Certain phases of the execution of LISP programs, in particular function
calls and variable bindings are better described in another graphical
language, the Contour Model.
This model, though well known intuitively by any translator writer, was
recently made explicit and embellished by J. Johnston. Its ability
to clearly show the binding mechanisms in block-structured languages
is exceptional.
Note first that this model is a different type than AMBIT/G. The
Contour Model is essentially a trace of the execution of a specific program, whereas
AMBIT/G is a reasonable candidate for describing the semantics of a
complete language. Indeed, AMBIT/G has been used to document the extensible
language, BASEL, and LISP.
The simplest way to introduce the Contour Model is by example:
.SKIP TO COLUMN 1;
.NEXT PAGE;
.SS(Macros and special forms,macros,P18:)
Most of the discussion to this point has dealt with getting
a new efficient symbol table which can fulfill our requirements
of permanence and multiplicity of properties. Little has been said
about user controlled function calling, which we called the desire for
generality. Before introducing the definition of the new evaluator
we shall deal with that aspect.
Recall our discussion of ⊗→special forms↔← in {yonss(P8}).
Special forms have been used for two purposes: to control the evaluation
of arguments (conditional expressions, quoted expressions,
%3and, or%*, etc.), and to create the effect of functions with an indefinite
number of arguments( %3list, append, plus, ...%*)
Frequently this second application of special forms can be improved upon,
particularly in the presence of a compiler.
Describing such functions as special forms is sufficient,
but not efficient, since the body of the definition must
contain explicit calls on %3eval%*. Even though we wish to define
some functions as if they had an arbitrary number of arguments, when
we %6use%* (i.e. call) the function, the number of arguments to be
applied %6is%* known. In particular, when the compiler is examining the
program, the number of arguments is known. Hopefully the compiler can be
made smart enough to compile better code than calls on %3eval%*.
That hope is well-founded.
Assume, for example, we wish to define %3plus%* as a function with
an indefinite number of arguments:
.BEGIN CENTERIT;SELECT 3;GROUP;
←plus[4;5] = 9
←plus[4;5;6] = 15
←plus[4;add1[2];4] = 11
.END
That is %3plus%* to have the properties of a function: its arguments
are to be evaluated (from left to right); but it can take an arbitrary number.
What is needed here seems clear: define %3plus%* in terms of a %2binary%*
addition function, %3*plus%*.
.BEGIN CENTERIT;SELECT 3;GROUP;
←plus[4;5] = *plus[4;5] = 9
←plus[4;5;6] = *plus[4;*plus[5;6]] = 15
←plus[4;add1[2];4] = *plus[4;*plus[add1[2];4]] = 11
.END
That is, we %2expand%* calls on %3plus%* into a composition of calls on
%3*plus%*.
%3plus%* is being used as a %2macro%* and the expansion process in terms
of %3*plus%* is called %2macro expansion%*. Notice that is macro expansion
can be done by a compiler, generating a sequence of calls on %3*plus%*.
Realize too, that since LISP programs must perform equivalently when
interpreted, we must recognize a macro construction inside %3eval%*.
How are macros written and how do they work? The body of a macro is a
λ-expression of %2one%* argument. The %2use%* of a macro looks just like
an ordinary function call, but what is bound to the λ-variable is the whole
call on the macro. When you look at the evaluation mechanism for
macros in the %aSM%*-%3eval%* you will see that the result of the macro
expansion is passed to %3eval%*. Thus the task of the macro body is to
expand the macro call and return this expansion as its value.
The task of the compiler will be quite similar; it will expand the macro
but instead of evaluating the expansion it must %2compile%* it.
.BEGIN TURN ON "#";
Let's define %3<%4m%*=%1 to mean "#is#defined#to#be#the#macro#...".
Then a simple macro definition of %3plus%* might be:
.END
.BEGIN SELECT 3;TABIT2(10,25);GROUP;
.P58:
\plus <%4m%*= λ[[l]\[eq[length[l];3] → cons[*PLUS;cdr[l]]
\\ T → list[*PLUS;cadr[l];cons[PLUS;cddr[l]]]] ]
.END
Thus a call %3(PLUS 3 4 5)%* would bind %3l%* to %3(PLUS 3 4 5)%* and
the evaluation of the body would result in %3(*PLUS 3 (PLUS 4 5))%*.
Evaluation of this expression would result in another call on the macro.
This time %3l%* would be bound to %3(PLUS 4 5)%*. Now
%3eq[length[l];3]%* %2is%* true and the value returned is %3(*PLUS 4 5)%*.
This can be evaluated, giving 9, and finally the outermost call on %3*PLUS%*
has all its arguments evaluated, and we get the final answer, 12.
This is a very simple (and inefficient) macro definition. Since the body
of a macro has available all of the evaluation mechanism of LISP, and
since the program structure of LISP is also the data structure, we can
perform arbitrary computations inside the expansion of the macro. Truly
LISP macros have the power to cloud men's minds!!!
Notice that %3SETQ%* can easily be defined as a macro
over %3SET%*:
.BEGIN CENTERIT;SELECT 3;GROUP;
←setq <%4m%*= λ[[m] list[SET;list[QUOTE;cadr[l]];caddr[l]]].
.END
.P54:
The effect of "<%4m%*=" should be clear:
it puts the body of the macro definition on the property-list of the macro
name under the indicator, %3MACRO%*. Likewise "<=" puts the function body
on the p-list under the indicator, %3EXPR%*. Similarly, we will define
"<%4f%*=" to mean "..is defined to be a special form..." and whose effect
is to put the body under the indicator, %3FEXPR%*. The body of a fexpr
definition is a λ-expression of a single λ-variable, say %3l%*. When the
fexpr is called we bind the list of %2unevaluated%* arguments to %3l%*.
Thus we could define %3plus%* by:
.BEGIN SELECT 3; GROUP;TABIT2(10,12);turn off "←";
.P59:
plus <%4f%*= λ[\[l] prog[[sum]
\\sum ← 0;
\a\[null[l] → return[sum]];
\\sum ← *plus[sum;eval[car[l]]];
\\l ← cdr[l];
\\go[a]]]
.END
Or %3and%* could be defined as:
.BEGIN SELECT 3;CENTERIT;GROUP;
←and <%4f%*= λ[[l]evand[l]]%1
where %3evand%* is defined on {yon(P53)}.
.END
Macros can be used to perform the operations which special forms
do, but with perhaps more efficiency.
The idea of macro processing is not recent. Some of the earliest assemblers
had extensive macro facilities. Lately macros have been used as a means
of extending so-called high level languages.
One of the most simple kinds of macros is %2textual substitution%*.
That is, when a use of a macro is detected we simply replace the call by its
body. A slightly more sophisticated appliation is the %2syntax macro%*.
Everytime we come across an application of a syntax macro the expander
processes it as if it had never been seen before even though much
of the expansion is repetitious. That is, syntax macros have no memory.
%2computational macros%* are an attempt to reduce some of this repetition.
In this scheme a certain amount of processing is done at the time the macro
is %2defined%*. For example a computational subtree refelcting the body of the
macro might be formed. Then whenever the macro is %2used%* we can
simply make a copy of this subtree and "glue" this subtree into the parse-tree
which we are building. This computational subtree is commonly formed by passing
the body of the macro through the compiler in a
"funny" way. The main problem with the computational macro is that there are
many desirable macros which have no such subtree, or there is other information
necessary to process the macro. There are solutions to this problem,
one of which closely parallels the abstract syntax ideas of McCarthy.
All of these styles of macros are subsets of the LISP macros.
Many versions of LISP also have a very simple syntax macro (called ⊗→%3read%* macros↔←)
independent of
the <%4m%*= - facility. Constants appear quite frequently in LISP expressions;
and so we have already introduced abbreviations for the sexpression
representation of numbers, %3T%* and %3NIL%*. That is we don't %3QUOTE%* them.
%3read%* macros are used to abbreviate other sexpressions. The most
common %3read%* macro is used to abbreviate the construct:
.TURN ON "←";
←%3(QUOTE %*<sexpression>%3)%*.
A special character is chosen to designate the macro, say "@". Then
then whenever %3read%* sees "@" the next sexpression, s, is read in and
the value: %3(QUOTE %1s%*)%1 is returned by %3read%*. Thus:
←@<sexpression> is an abbreviation for %3(QUOTE <sexpression>).%1
In some systems the user may define his own %3read%* macros, in others
they are pre-defined.
.SS(Problems)
I. Define %3list%* and %3append%* as macros. You may use only
the LISP primitives (functions, predicates, conditionals and
recursion) and a binary function, %3*append%*.
II. Give a macro definition of an extended %3SETQ%*, which is called
as %3(SETQ#%1var%41%*#exp%41%*#...#var%4n%*#exp%4n%3)%1.
Each var%4i%* is a name; each exp%4i%* is an expression to be evaluated
and assigned to var%4i%1. The assignments should go from "left-to-right".
Thus %3(SETQ X 2 Y (TIMES 2 X) X 3)%* when executed should assign
%33%* to %3X%* and %34%* to %3Y%*.
III. Define %3list%* as a fexpr.
.SS(%aSM%*-%3eval%*,%3eval%*,P30:)
Finally, here is the new evaluator.
(the `line numbers' are not part of the definitions)
.BEGIN VERBATIM;SELECT 3;
eval <=
1. λ[[x]prog[[y]
2. return[[numberp[x] → x;
3. atom[x] → [y ← get[car[x];VALUE] → cdr[y]]
T → err[UNBOUND VARIABLE]];
4. atom[car[x]] → [y ← getl[car[x];(EXPR FEXPR MACRO)]
5. → [eq[car[y];EXPR] → apply[cadr[y];mapcar[function[eval];cdr[x]]];
6. eq[car[y];FEXPR] → apply[cadr[y];list[cdr[x]]];
7. T → eval[apply[cadr[y];list[x]]]];
8. y ← get[car[x];VALUE] → eval[cons[cdr[y];cdr[x]]];
9. T → err[UNDEFINED FUNCTION]];
10. T → apply[car[x];mapcar[function[eval];cdr[x]]]]]]]
apply <=
λ[[fn;args]
11. [atom[fn] → [get[fn;EXPR] → apply[get[fn;EXPR];args];
12. T → apply[eval[fn];args]];
13. eq[car[fn;LAMBDA] → prog[[z]
14. bind[cadr[fn];args];
15. z ← eval[caddr[fn]];
16. unbind[];
17. return[z]];
18. T → apply[eval[fn];args] ]]
.END;
First let's describe the new LISP system functions which are used here.
They are:
.BEGIN INDENT 0,10;
.p52:
%3get[x;y]: %*
%3x%* is an atom; %3y%* is an indicator. ⊗→%3get%*↔← will return the value-part
of the attribute-value pair
associated with %3y%* under the atom %3x%*. If %3x%* does not have the
indicator %3y%*, then %3NIL%* is returned.
.END
.BEGIN INDENT 0,10;
%3getl[x;y]:%*
%3x%* is an atom; %3y%* is a list of indicators (see line 4). ⊗→%3getl%*↔←
will search the property list of the atom %3x%* for the first occurrence
of an indicator which appears in the list %3y%*. If such a match
is found, then the remainder of the p-list, beginning with the
matching indicator, is returned. If no match is found, %3NIL%* is
returned.
.END
.BEGIN CENTERIT;GROUP;
An example:
.GROUP SKIP 6;
←%3getl[FOO;(BAZ PNAME)]%* has value: %3getl[FOO;(PNAME)]%* has value:
.END
The virtue of %3getl%* is that it can distinguish between an atom which has
an indicator with value %3NIL%* and an atom which does not have the indicator.
%3get%* cannot communicate this distinction.
First, on lines 5 and 10 we use the LISP mapping function, %3mapcar%*, to evaluate the argument
list during function calls. See {yon(P34)}.
Notice line %33.%* is an application of an obscene LISP coding trick
described on {yon(P28)}. Notice too, that line %33%* uses %3get%*. You
should understand why %3get%* works and %3getl%* is not required.
There are enough important new details in this %3eval-apply%* family
to warrant spending a bit of time. Most of the changes involve symbol table
operations. %3eval%* (and %3apply%*) no longer carry an explicit symbol table.
The property-list of tha atom contains the information now.
In this new scheme of things, %3get%* and %3getl%* search the symbol table
(p-list) for the appropriate bindings.
%3eval%* first checks if it has been given a number; any other atomic
expression given to %3eval%* is expected to be a variable and to
have a ⊗→%3VALUE%*-cell↔←.
If the %3car%* of the expression given to %3eval%* is atomic, then the
p-list of that atom is examined for one of three indicators.
We have already seen ⊗→%3EXPR%*↔← on {yon(P29)}, it is the indicator designating
a function represented as a λ-expression.
Evaluate the arguments and call %3apply%*. The indicator ⊗→%3FEXPR%*↔←,
designates a special
form. Pass the %2un%*evaluated argument list to %3apply%*.
The indicator, %3MACRO%*, is recognized on line %37%*.
There are actually two other indicators recognized by %3eval%*. ⊗→%3SUBR%*↔← is
the machine-language version of an %3EXPR%*; and ⊗→%3FSUBR%*↔← is the machine
version of a %3FEXPR%*.
.SS(Binding revisited,bind)
The most interesting changes in the evaluator involve binding
and unbinding of variables.
We have seen that the old symbol table mechanism has the correct
effect for the proper evaluation of recursive functions. As
we bound variables to values on entry to a λ-expression, we
%3cons%*ed the new bindings onto the front of the symbol table. This
had two results:
%21.%* In the evaluation of the body of the λ-expression we got the
correct bindings of the variables.
%22.%* The old value was saved so that we could retrieve it on leaving
the λ-body.
Now with the new table organization we need to develop the same
facility. It is not sufficient for the binding of λ-variables to
simply change the contents of the ⊗→%3VALUE%*-cell↔←. This would violate
condition %22%*. Besides changing the value, we must save the prior
value.
The crucial phrases are lines 14 and 16 in the new version of %3apply%*.
⊗→%3bind%*↔← is a function, taking two arguments. The first is the list of
λ-variables; the second is the list of new values. %3bind%* saves the
old values and rebinds the λ-variables to the new values.
⊗→%3unbind%*↔← is a function of no arguments used to restore a list
of λ-variables to their previous values. How is
this wonderous feat performed?
We have a ⊗→stack↔←, or ⊗→pushdown-list↔←, called SP in which we can
save old values of variables. %3bind%* will add values to SP; %3unbind%*
restores the saved values. Actually because of the stack-like
behavior of the binding and unbinding processes we can increase
the efficiency of %3bind%* and %3unbind%*. %3bind%* saves both the value
and the location of the %3VALUE%*-cell for each variable being rebound.
It also puts special marks in the SP stack delimiting each block
of λ-rebindings. All %3unbind%* need do is restore the first block
of saved values. It knows the values and also knows the addresses of
the %3VALUE%*-cells.
All of the information it needed for restoration is saved in the stack.
.BEGIN GROUP;TABIT3(30,45,60);
Given %1λ[[x%41%*; ...; x%4n%*]%9x%1][a%41%*; ... ;a%4n%*], here's an example of binding:
\ to value\\old value of x%41%*
\ cell for x%41%*
\save
\and\ ...
\rebind
\ ===>
\ to value\\old value of x%4n%*
\ cell for x%4n%*
\\ |
\\ ↓
\\rebind x%4i%* to a%4i%*, changing the
\\ contents of the value cell
\\ (see %3*bind%*)
\\ |
\\ ↓
\\ evaluate %9x%*
\\ |
\\ ↓
\\restore old values using information
\\ stored in satck, SP.
\\ (see %3*unbind%*)
.END
.BEGIN CENTERIT;
.GROUP SKIP 15;
←%2The primitives, %3*bind%* and %3*unbind%*.
.END
.<<PROBS OF BINDING AND FEXPRS>>
.BEGIN TURN ON "←";GROUP;
←%2Problems%*
I Evaluate the following using the new %3eval%*:
1. %3eval[((LAMBDA(L)(PRINT L))(QUOTE A))]%* assuming %3print%* is the usual printing function.
2. %3eval[(FOO (QUOTE A))]%* where %3foo <= λ[[l]print[l]]%*.
3. %3eval[((CAR (QUOTE (FOO)))(QUOTE A))]%*, %3foo%* as above.
4. %3eval[(FOO1 (QUOTE A))]%* where %3foo1 <%4f%*= λ[[l]print[l]]%1.
and finally:
5. %3eval[((CAR (QUOTE (FOO1)))(QUOTE A))]%*.
Explain what happened. Can you "fix" it?
II Some implementations of LISP distinguish between %3EXPR%*s and %3FEXPR%*s
by modifying the prefix, %3LAMBDA%*. %3FEXPR%*s are stored with prefix %3NLAMBDA%*;
%3EXPR%*s are stored with the normal prefix %3LAMBDA%*. What are the
advantages and/or disadvantages of this scheme.
III Recall the extensions to LISP binding proposed in {yonss(P67)}.
Discuss their implementation in light of the new %3eval%* and symbol table.
.END
.SS(Review)
As things presently stand we have a basic LISP machine consisting
of an encoding of %3eval%* and friends, the I/O routines, the garbage
collector, an initial symbol table which believes in the primitive
functions and some commonly needed utility functions. We have
two areas of memory: free space, and full word space.
Expressions (forms) to be evaluated are read in, converted to list structure,
and evaluated by %3eval%*. What %3eval%* is doing is traversing the
sexpression representation of the form, interpreting the
information found there as LISP "instructions". This is an expensive process.
In the general case there is nothing we can do to reduce this cost, but very
frequently there is a mapping from the LISP expressions to be
evaluated to a sequence of actual machine instructions, which when
carried out will have the same computational effect as %3eval%*, massaging
the list structure. This mapping is called a ⊗→compiler↔←. %3eval%*
and friends are called an %3interpreter%*.
What we wish to do now is describe a compiler for a subset of LISP.
This has many motivations:
%21.%* To describe the compiler we must carefully define the machine
which will execute the instructions which the compile produces.
Examination of a machine will reinforce the ideas of stacks, make
more concrete the structure of recursion, of garbage collection and
representation of sexprs and list structure.
%22.%* It will show a very non-trivial application of non-numerical
computation. At the same time we will see how simple it is to
describe such complex algorithms in LISP.
%23.%* It will clearly show the relationship between compilation and
evaluation. That is, the LISP function representing the compiler
will very closely parallel the structure of the evaluator, %3eval%*.
If you understand %3eval%*, then the compiler is easy.
%24.%* Everyone should see a compiler ("%2Shut up!%*", he explained).
.SS(%aSM%*:A Simple machine,%aSM%*,P33:)
This section describes a simple machine which
has a sufficient instruction set to handle an implementation of LISP.
Before we begin, note that this machine is %2not%* necessary for our
understanding of %3eval%*. %3eval%* is self-contained. We need only dsecribe
a machine to discuss implementation and compilation. This,
indeed is an objection to describing meaning of programming languages
in terms of a compiler -- you must understand %2two%* things now -- the
language %2and%* a machine.
The simple machine, %aSM%*, has a slight similarity to the organization of the
PDP-10. We need very few features to adequately discuss the interesting
facets of implementation
of our LISP subset. Certainly, if we were to undertake a real implementation, many
more instructions would be necessary. Similarly, when we discuss compilation
our %aSM%* suffices, but if we wished to perform %2efficient%* compilation
we would hopefully have a better instruction set. The point here
is to understand basic algorithms. If that is accomplished it is quite
straightforward to examine problems of efficiency, and details of implementation.
%aSM%* has a conventional addressable main memory, and n special registers,
AC1, AC2, ..., ACn. These registers are called %2accumulators%*
and as with the AMBIT/G description, these n registers are to contain pointers
to the arguments of a function at the time of invocation.
Each memory location or register is assumed to be large enough to contain
two addresses. For sake of discussion, assume the word size is 36 bits.
Assume the addressing space is then 2%818%*.
The mapping of a %7[~~~]~~~]%* onto a %aSM%* location is easy; the %3car%*
maps to the left-half of the word; the %3cdr%*, to the right.
A memory area to contain full-words (%7 [~~~~~~]%* ) is designated.
All p-names and numbers are stored here.
Parts of %aSM%* memory can be designated as stacks. Each stack is a contiguous
area of memory, and the current top of the stack is referenced by a general
register, P1, ..., Pn, called a %2stack-%* or %2pushdown-%* pointer.
The stacks will be used to contain the partial results of calculations and
will contain the information necessary to implement recursive function calling.
The primary LISP stack is named P.
There are only three main classes of instructions necessary to describe
LISP implementation: instructions for constant generation, instructions for
stack manipulation, and instructions for control of flow.
The control instructions and some of the stack instructions refer to the
program counter of %aSM%*. %aPC%* designates this counter. %3C%* in the following
means "contents of...".
.BEGIN TABIT1(19);TURN OFF "←";
MOVEI ACi const\%3C%1(ACi) ← const
PUSH P ACj\%3C%*(%3C%*(P)) ← %3C%*(ACj)
\%3C%*(P) ← %3C%*(P)+1. Push contents of ACj onto top of stack.
POP P ACj\%3C%*(P) ← %3C%*(P)-1
\%3C%*(ACj) ← %3C%1(%3C%*(P)). Pop top of stack into ACj.
POPJ P\P ← %3C%*(%3C%*(P))-1
\%3C%*(%aPC%*) ← %3C%*(%3C%*(P)). Pop top of stack into %aPC%*; used as return.
.BEGIN INDENT 0,19;FILL;
CALL i fn\This instruction handles function calling. i is the number of arguments
(assumed to be in AC1 through ACi at time of call). fn represents a function name.
One effect of CALL is to leave return information on the stack, P.
.END
MOVE ACi -j P\%3C%*(ACi) ← copy of the j%8th%* element (counting from zero%8th%*) of stack P.
MOVEM ACi -j P\j%8th%* element of stack P is replaced by contents of ACi.
.BEGIN INDENT 0,19;FILL;
SUB x y\%3C%*(x) ← %3C%*(x) - %3C%*(y); Used in the context, SUB P const, to remove
a block of cells from the top of the stack.
.END
JRST j\%3C%*(%aPC%*) ← j. Go to loc j.
JUMPE ACi j\%2if%* %3C%*(ACi)=0 %2then%* %3C%*(%aPC%*) ← j;
JUMPN ACi j\%2if%* %3C%*(ACi)%c≠%*0 %2then%* %3C%*(%aPC%*) ← j;
.END
For much of the sequel, the only calling sequence of interest will be
ordinary function calls. The evaluated arguments are to be loaded into
the accumulators.
Internal λ-expressions are not handled yet. (See problem IV on {yon(P36)}).
More instructions for %aSM%* will be given on {yon(P35)} when we
discuss efficiency.
.END "JRA";